diff --git a/README.md b/README.md index 8832637..be3be65 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,11 @@

- logo + logo

MIT License - - - - +

--- @@ -16,30 +13,33 @@ **NLP-Models-Tensorflow**, Gathers machine learning and tensorflow deep learning models for NLP problems, **code simplify inside Jupyter Notebooks 100%**. ## Table of contents - * [Text classification](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-classification) - * [Chatbot](https://github.com/huseinzol05/NLP-Models-Tensorflow#chatbot) - * [Neural Machine Translation](https://github.com/huseinzol05/NLP-Models-Tensorflow#neural-machine-translation-english-to-vietnam) - * [Embedded](https://github.com/huseinzol05/NLP-Models-Tensorflow#embedded) - * [Entity-Tagging](https://github.com/huseinzol05/NLP-Models-Tensorflow#entity-tagging) - * [POS-Tagging](https://github.com/huseinzol05/NLP-Models-Tensorflow#pos-tagging) - * [Dependency-Parser](https://github.com/huseinzol05/NLP-Models-Tensorflow#dependency-parser) - * [SQUAD Question-Answers](https://github.com/huseinzol05/NLP-Models-Tensorflow#squad-question-answers) - * [Question-Answers](https://github.com/huseinzol05/NLP-Models-Tensorflow#question-answers) - * [Abstractive Summarization](https://github.com/huseinzol05/NLP-Models-Tensorflow#abstractive-summarization) - * [Extractive Summarization](https://github.com/huseinzol05/NLP-Models-Tensorflow#extractive-summarization) - * [Stemming](https://github.com/huseinzol05/NLP-Models-Tensorflow#stemming) - * [Generator](https://github.com/huseinzol05/NLP-Models-Tensorflow#generator) - * [Topic Generator](https://github.com/huseinzol05/NLP-Models-Tensorflow#topic-generator) - * [Language detection](https://github.com/huseinzol05/NLP-Models-Tensorflow#language-detection) - * [OCR (optical character recognition)](https://github.com/huseinzol05/NLP-Models-Tensorflow#ocr-optical-character-recognition) - * [Sentence-Pair classification](https://github.com/huseinzol05/NLP-Models-Tensorflow#sentence-pair) - * [Speech to Text](https://github.com/huseinzol05/NLP-Models-Tensorflow#speech-to-text) - * [Text to Speech](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-to-speech) - * [Old-to-Young Vocoder](https://github.com/huseinzol05/NLP-Models-Tensorflow#old-to-young-vocoder) - * [Text Similarity](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-similarity) - * [Text Augmentation](https://github.com/huseinzol05/NLP-Models-Tensorflow#text-augmentation) - * [Miscellaneous](https://github.com/huseinzol05/NLP-Models-Tensorflow#Miscellaneous) - * [Attention](https://github.com/huseinzol05/NLP-Models-Tensorflow#attention) + * [Abstractive Summarization](#abstractive-summarization) + * [Chatbot](#chatbot) + * [Dependency Parser](#dependency-parser) + * [Entity Tagging](#entity-tagging) + * [Extractive Summarization](#extractive-summarization) + * [Generator](#generator) + * [Language Detection](#language-detection) + * [Neural Machine Translation](neural-machine-translation) + * [OCR](#ocr-optical-character-recognition) + * [POS Tagging](#pos-tagging) + * [Question-Answers](#question-answers) + * [Sentence pairs](#sentence-pair) + * [Speech-to-Text](#speech-to-text) + * [Spelling correction](#spelling-correction) + * [SQUAD Question-Answers](#squad-question-answers) + * [Stemming](#stemming) + * [Text Augmentation](#text-augmentation) + * [Text Classification](#text-classification) + * [Text Similarity](#text-similarity) + * [Text-to-Speech](#text-to-speech) + * [Topic Generator](#topic-generator) + * [Topic Modeling](#topic-modeling) + * [Unsupervised Extractive Summarization](#unsupervised-extractive-summarization) + * [Vectorizer](#vectorizer) + * [Old-to-Young Vocoder](#old-to-young-vocoder) + * [Visualization](#visualization) + * [Attention](#attention) ## Objective @@ -49,125 +49,40 @@ I will attached github repositories for models that I not implemented from scrat ## Tensorflow version -Tensorflow version 1.10 and above only, not included 2.X version. +Tensorflow version 1.13 and above only, not included 2.X version. 1.13 < Tensorflow < 2.0 + +```bash +pip install -r requirements.txt +``` ## Contents -### [Text classification](text-classification) +### [Abstractive Summarization](abstractive-summarization) -Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data). +Trained on [India news](abstractive-summarization/dataset). -1. Basic cell RNN -2. Bidirectional RNN -3. LSTM cell RNN -4. GRU cell RNN -5. LSTM RNN + Conv2D -6. K-max Conv1d -7. LSTM RNN + Conv1D + Highway -8. LSTM RNN with Attention -9. Neural Turing Machine -10. BERT -11. Dynamic Memory Network -12. XL-net - -
Complete list (76 notebooks) +Accuracy based on 10 epochs only, calculated using word positions. -1. Basic cell RNN -2. Basic cell RNN + Hinge -3. Basic cell RNN + Huber -4. Basic cell Bidirectional RNN -5. Basic cell Bidirectional RNN + Hinge -6. Basic cell Bidirectional RNN + Huber -7. LSTM cell RNN -8. LSTM cell RNN + Hinge -9. LSTM cell RNN + Huber -10. LSTM cell Bidirectional RNN -11. LSTM cell Bidirectional RNN + Huber -12. LSTM cell RNN + Dropout + L2 -13. GRU cell RNN -14. GRU cell RNN + Hinge -15. GRU cell RNN + Huber -16. GRU cell Bidirectional RNN -17. GRU cell Bidirectional RNN + Hinge -18. GRU cell Bidirectional RNN + Huber -19. LSTM RNN + Conv2D -20. K-max Conv1d -21. LSTM RNN + Conv1D + Highway -22. LSTM RNN + Basic Attention -23. LSTM Dilated RNN -24. Layer-Norm LSTM cell RNN -25. Only Attention Neural Network -26. Multihead-Attention Neural Network -27. Neural Turing Machine -28. LSTM Seq2Seq -29. LSTM Seq2Seq + Luong Attention -30. LSTM Seq2Seq + Bahdanau Attention -31. LSTM Seq2Seq + Beam Decoder -32. LSTM Bidirectional Seq2Seq -33. Pointer Net -34. LSTM cell RNN + Bahdanau Attention -35. LSTM cell RNN + Luong Attention -36. LSTM cell RNN + Stack Bahdanau Luong Attention -37. LSTM cell Bidirectional RNN + backward Bahdanau + forward Luong -38. Bytenet -39. Fast-slow LSTM -40. Siamese Network -41. LSTM Seq2Seq + tf.estimator -42. Capsule layers + RNN LSTM -43. Capsule layers + LSTM Seq2Seq -44. Capsule layers + LSTM Bidirectional Seq2Seq -45. Nested LSTM -46. LSTM Seq2Seq + Highway -47. Triplet loss + LSTM -48. DNC (Differentiable Neural Computer) -49. ConvLSTM -50. Temporal Convd Net -51. Batch-all Triplet-loss + LSTM -52. Fast-text -53. Gated Convolution Network -54. Simple Recurrent Unit -55. LSTM Hierarchical Attention Network -56. Bidirectional Transformers -57. Dynamic Memory Network -58. Entity Network -59. End-to-End Memory Network -60. BOW-Chars Deep sparse Network -61. Residual Network using Atrous CNN -62. Residual Network using Atrous CNN + Bahdanau Attention -63. Deep pyramid CNN -64. Transformer-XL -65. Transfer learning GPT-2 345M -66. Quasi-RNN -67. Tacotron -68. Slice GRU -69. Slice GRU + Bahdanau -70. Wavenet -71. Transfer learning BERT Base -72. Transfer learning XL-net Large -73. LSTM BiRNN global Max and average pooling -74. Transfer learning BERT Base drop 6 layers -75. Transfer learning BERT Large drop 12 layers -76. Transfer learning XL-net Base +
Complete list (12 notebooks) + +1. LSTM Seq2Seq using topic modelling, test accuracy 13.22% +2. LSTM Seq2Seq + Luong Attention using topic modelling, test accuracy 12.39% +3. LSTM Seq2Seq + Beam Decoder using topic modelling, test accuracy 10.67% +4. LSTM Bidirectional + Luong Attention + Beam Decoder using topic modelling, test accuracy 8.29% +5. Pointer-Generator + Bahdanau, https://github.com/xueyouluo/my_seq2seq, test accuracy 15.51% +6. Copynet, test accuracy 11.15% +7. Pointer-Generator + Luong, https://github.com/xueyouluo/my_seq2seq, test accuracy 16.51% +8. Dilated Seq2Seq, test accuracy 10.88% +9. Dilated Seq2Seq + Self Attention, test accuracy 11.54% +10. BERT + Dilated CNN Seq2seq, test accuracy 13.5% +11. self-attention + Pointer-Generator, test accuracy 4.34% +12. Dilated-CNN Seq2seq + Pointer-Generator, test accuracy 5.57%
### [Chatbot](chatbot) -Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/chatbot/dataset.tar.gz). - -1. Seq2Seq-manual -2. Seq2Seq-API Greedy -3. Bidirectional Seq2Seq-manual -4. Bidirectional Seq2Seq-API Greedy -5. Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong -6. Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder -7. Bytenet -8. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder -9. End-to-End Memory Network -10. Attention is All you need -11. Transformer-XL + LSTM -12. GPT-2 + LSTM -13. Tacotron + Beam decoder +Trained on [Cornell Movie Dialog corpus](chatbot/dataset.tar.gz), accuracy table in [chatbot](chatbot).
Complete list (54 notebooks) @@ -220,7 +135,7 @@ Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Mode 47. Attention is all you need + Beam Search 48. Transformer-XL + LSTM 49. GPT-2 + LSTM -50. Fairseq +50. CNN Seq2seq 51. Conv-Encoder + LSTM 52. Tacotron + Greedy decoder 53. Tacotron + Beam decoder @@ -228,103 +143,171 @@ Trained on [Cornell Movie Dialog corpus](https://github.com/huseinzol05/NLP-Mode
-### [Neural Machine Translation](neural-machine-translation) +### [Dependency-Parser](dependency-parser) -Trained on [500 English-Vietnam](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/neural-machine-translation/vietnam-train). +Trained on [CONLL English Dependency](https://github.com/UniversalDependencies/UD_English-EWT). Train set to train, dev and test sets to test. -1. Seq2Seq-manual -2. Seq2Seq-API Greedy -3. Bidirectional Seq2Seq-manual -4. Bidirectional Seq2Seq-API Greedy -5. Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong -6. Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder -7. Bytenet -8. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder -9. End-to-End Memory Network -10. Attention is All you need -11. BERT + Dilated Fairseq +Stackpointer and Biaffine-attention originally from https://github.com/XuezheMax/NeuroNLP2 written in Pytorch. -
Complete list (55 notebooks) +Accuracy based on arc, types and root accuracies after 15 epochs only. -1. Basic cell Seq2Seq-manual -2. LSTM Seq2Seq-manual -3. GRU Seq2Seq-manual -4. Basic cell Seq2Seq-API Greedy -5. LSTM Seq2Seq-API Greedy -6. GRU Seq2Seq-API Greedy -7. Basic cell Bidirectional Seq2Seq-manual -8. LSTM Bidirectional Seq2Seq-manual -9. GRU Bidirectional Seq2Seq-manual -10. Basic cell Bidirectional Seq2Seq-API Greedy -11. LSTM Bidirectional Seq2Seq-API Greedy -12. GRU Bidirectional Seq2Seq-API Greedy -13. Basic cell Seq2Seq-manual + Luong Attention -14. LSTM Seq2Seq-manual + Luong Attention -15. GRU Seq2Seq-manual + Luong Attention -16. Basic cell Seq2Seq-manual + Bahdanau Attention -17. LSTM Seq2Seq-manual + Bahdanau Attention -18. GRU Seq2Seq-manual + Bahdanau Attention -19. LSTM Bidirectional Seq2Seq-manual + Luong Attention -20. GRU Bidirectional Seq2Seq-manual + Luong Attention -21. LSTM Bidirectional Seq2Seq-manual + Bahdanau Attention -22. GRU Bidirectional Seq2Seq-manual + Bahdanau Attention -23. LSTM Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong -24. GRU Bidirectional Seq2Seq-manual + backward Bahdanau + forward Luong -25. LSTM Seq2Seq-API Greedy + Luong Attention -26. GRU Seq2Seq-API Greedy + Luong Attention -27. LSTM Seq2Seq-API Greedy + Bahdanau Attention -28. GRU Seq2Seq-API Greedy + Bahdanau Attention -29. LSTM Seq2Seq-API Beam Decoder -30. GRU Seq2Seq-API Beam Decoder -31. LSTM Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder -32. GRU Bidirectional Seq2Seq-API + Luong Attention + Beam Decoder -33. LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder -34. GRU Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder -35. Bytenet -36. LSTM Seq2Seq + tf.estimator -37. Capsule layers + LSTM Seq2Seq-API Greedy -38. Capsule layers + LSTM Seq2Seq-API + Luong Attention + Beam Decoder -39. LSTM Bidirectional Seq2Seq-API + backward Bahdanau + forward Luong + Stack Bahdanau Luong Attention + Beam Decoder + Dropout + L2 -40. DNC Seq2Seq -41. LSTM Bidirectional Seq2Seq-API + Luong Monotic Attention + Beam Decoder -42. LSTM Bidirectional Seq2Seq-API + Bahdanau Monotic Attention + Beam Decoder -43. End-to-End Memory Network + Basic cell -44. End-to-End Memory Network + LSTM cell -45. Attention is all you need -46. Transformer-XL -47. Attention is all you need + Beam Search -48. Fairseq -49. Conv-Encoder + LSTM -50. Bytenet Greedy -51. Residual GRU Bidirectional Seq2Seq-API Greedy -52. Google NMT -53. Dilated Seq2Seq -54. BERT Encoder + LSTM Luong Decoder -55. BERT Encoder + Dilated Fairseq +
Complete list (8 notebooks) + +1. Bidirectional RNN + CRF + Biaffine, arc accuracy 70.48%, types accuracy 65.18%, root accuracy 66.4% +2. Bidirectional RNN + Bahdanau + CRF + Biaffine, arc accuracy 70.82%, types accuracy 65.33%, root accuracy 66.77% +3. Bidirectional RNN + Luong + CRF + Biaffine, arc accuracy 71.22%, types accuracy 65.73%, root accuracy 67.23% +4. BERT Base + CRF + Biaffine, arc accuracy 64.30%, types accuracy 62.89%, root accuracy 74.19% +5. Bidirectional RNN + Biaffine Attention + Cross Entropy, arc accuracy 72.42%, types accuracy 63.53%, root accuracy 68.51% +6. BERT Base + Biaffine Attention + Cross Entropy, arc accuracy 72.85%, types accuracy 67.11%, root accuracy 73.93% +7. Bidirectional RNN + Stackpointer, arc accuracy 61.88%, types accuracy 48.20%, root accuracy 89.39% +8. XLNET Base + Biaffine Attention + Cross Entropy, arc accuracy 74.41%, types accuracy 71.37%, root accuracy 73.17%
-### [Embedded](embedded) +### [Entity-Tagging](entity-tagging) -Trained on [English sentiment dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/text-classification/data). +Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81). -1. Word Vector using CBOW sample softmax -2. Word Vector using CBOW noise contrastive estimation -3. Word Vector using skipgram sample softmax -4. Word Vector using skipgram noise contrastive estimation -5. Lda2Vec Tensorflow -6. Supervised Embedded -7. Triplet-loss + LSTM -8. LSTM Auto-Encoder -9. Batch-All Triplet-loss LSTM -10. Fast-text -11. ELMO (biLM) -12. Triplet-loss + BERT +
Complete list (9 notebooks) + +1. Bidirectional RNN + CRF, test accuracy 96% +2. Bidirectional RNN + Luong Attention + CRF, test accuracy 93% +3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95% +4. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96% +5. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96% +6. Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 69% +7. Char Ngrams + Attention is you all Need + CRF, test accuracy 90% +8. BERT, test accuracy 99% +9. XLNET-Base, test accuracy 99% + +
+ +### [Extractive Summarization](extractive-summarization) + +Trained on [CNN News dataset](https://cs.nyu.edu/~kcho/DMQA/). + +Accuracy based on ROUGE-2. + +
Complete list (4 notebooks) + +1. LSTM RNN, test accuracy 16.13% +2. Dilated-CNN, test accuracy 15.54% +3. Multihead Attention, test accuracy 26.33% +4. BERT-Base + +
+ +### [Generator](generator) + +Trained on [Shakespeare dataset](generator/shakespeare.txt). + +
Complete list (15 notebooks) + +1. Character-wise RNN + LSTM +2. Character-wise RNN + Beam search +3. Character-wise RNN + LSTM + Embedding +4. Word-wise RNN + LSTM +5. Word-wise RNN + LSTM + Embedding +6. Character-wise + Seq2Seq + GRU +7. Word-wise + Seq2Seq + GRU +8. Character-wise RNN + LSTM + Bahdanau Attention +9. Character-wise RNN + LSTM + Luong Attention +10. Word-wise + Seq2Seq + GRU + Beam +11. Character-wise + Seq2Seq + GRU + Bahdanau Attention +12. Word-wise + Seq2Seq + GRU + Bahdanau Attention +13. Character-wise Dilated CNN + Beam search +14. Transformer + Beam search +15. Transformer XL + Beam search + +
+ +### [Language-detection](language-detection) + +Trained on [Tatoeba dataset](http://downloads.tatoeba.org/exports/sentences.tar.bz2). + +
Complete list (1 notebooks) + +1. Fast-text Char N-Grams + +
+ +### [Neural Machine Translation](neural-machine-translation) + +Trained on [English-French](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/translate_enfr.py), accuracy table in [neural-machine-translation](neural-machine-translation). + +
Complete list (53 notebooks) + +1.basic-seq2seq +2.lstm-seq2seq +3.gru-seq2seq +4.basic-seq2seq-contrib-greedy +5.lstm-seq2seq-contrib-greedy +6.gru-seq2seq-contrib-greedy +7.basic-birnn-seq2seq +8.lstm-birnn-seq2seq +9.gru-birnn-seq2seq +10.basic-birnn-seq2seq-contrib-greedy +11.lstm-birnn-seq2seq-contrib-greedy +12.gru-birnn-seq2seq-contrib-greedy +13.basic-seq2seq-luong +14.lstm-seq2seq-luong +15.gru-seq2seq-luong +16.basic-seq2seq-bahdanau +17.lstm-seq2seq-bahdanau +18.gru-seq2seq-bahdanau +19.basic-birnn-seq2seq-bahdanau +20.lstm-birnn-seq2seq-bahdanau +21.gru-birnn-seq2seq-bahdanau +22.basic-birnn-seq2seq-luong +23.lstm-birnn-seq2seq-luong +24.gru-birnn-seq2seq-luong +25.lstm-seq2seq-contrib-greedy-luong +26.gru-seq2seq-contrib-greedy-luong +27.lstm-seq2seq-contrib-greedy-bahdanau +28.gru-seq2seq-contrib-greedy-bahdanau +29.lstm-seq2seq-contrib-beam-luong +30.gru-seq2seq-contrib-beam-luong +31.lstm-seq2seq-contrib-beam-bahdanau +32.gru-seq2seq-contrib-beam-bahdanau +33.lstm-birnn-seq2seq-contrib-beam-bahdanau +34.lstm-birnn-seq2seq-contrib-beam-luong +35.gru-birnn-seq2seq-contrib-beam-bahdanau +36.gru-birnn-seq2seq-contrib-beam-luong +37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic +38.gru-birnn-seq2seq-contrib-beam-luongmonotic +39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic +40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic +41.residual-lstm-seq2seq-greedy-luong +42.residual-gru-seq2seq-greedy-luong +43.residual-lstm-seq2seq-greedy-bahdanau +44.residual-gru-seq2seq-greedy-bahdanau +45.memory-network-lstm-decoder-greedy +46.google-nmt +47.transformer-encoder-transformer-decoder +48.transformer-encoder-lstm-decoder-greedy +49.bertmultilanguage-encoder-bertmultilanguage-decoder +50.bertmultilanguage-encoder-lstm-decoder +51.bertmultilanguage-encoder-transformer-decoder +52.bertenglish-encoder-transformer-decoder +53.transformer-t2t-2gpu + +
+ +### [OCR (optical character recognition)](ocr) + +
Complete list (2 notebooks) + +1. CNN + LSTM RNN, test accuracy 100% +2. Im2Latex, test accuracy 100% + +
### [POS-Tagging](pos-tagging) Trained on [CONLL POS](https://cogcomp.org/page/resource_view/81). +
Complete list (8 notebooks) + 1. Bidirectional RNN + CRF, test accuracy 92% 2. Bidirectional RNN + Luong Attention + CRF, test accuracy 91% 3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 91% @@ -334,50 +317,80 @@ Trained on [CONLL POS](https://cogcomp.org/page/resource_view/81). 7. Char Ngrams + Attention is you all Need + CRF, test accuracy 89% 8. BERT, test accuracy 99% -### [Entity-Tagging](entity-tagging) +
-Trained on [CONLL NER](https://cogcomp.org/page/resource_view/81). +### [Question-Answers](question-answer) -1. Bidirectional RNN + CRF, test accuracy 96% -2. Bidirectional RNN + Luong Attention + CRF, test accuracy 93% -3. Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 95% -4. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96% -5. Char Ngrams + Bidirectional RNN + Bahdanau Attention + CRF, test accuracy 96% -6. Char Ngrams + Residual Network + Bahdanau Attention + CRF, test accuracy 69% -7. Char Ngrams + Attention is you all Need + CRF, test accuracy 90% -8. BERT, test accuracy 99% +Trained on [bAbI Dataset](https://research.fb.com/downloads/babi/). -### [Dependency-Parser](dependency-parser) +
Complete list (4 notebooks) + +1. End-to-End Memory Network + Basic cell +2. End-to-End Memory Network + GRU cell +3. End-to-End Memory Network + LSTM cell +4. Dynamic Memory + +
+ +### [Sentence-pair](sentence-pair) + +Trained on [Cornell Movie--Dialogs Corpus](https://people.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html) -Trained on [CONLL English Dependency](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/dependency-parser/dev.conll.txt). +
Complete list (1 notebooks) -1. Bidirectional RNN + Bahdanau Attention + CRF -2. Bidirectional RNN + Luong Attention + CRF -3. Residual Network + Bahdanau Attention + CRF -4. Residual Network + Bahdanau Attention + Char Embedded + CRF -5. Attention is all you need + CRF +1. BERT + +
+ +### [Speech to Text](speech-to-text) + +Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487). + +
Complete list (11 notebooks) + +1. Tacotron, https://github.com/Kyubyong/tacotron_asr, test accuracy 77.09% +2. BiRNN LSTM, test accuracy 84.66% +3. BiRNN Seq2Seq + Luong Attention + Cross Entropy, test accuracy 87.86% +4. BiRNN Seq2Seq + Bahdanau Attention + Cross Entropy, test accuracy 89.28% +5. BiRNN Seq2Seq + Bahdanau Attention + CTC, test accuracy 86.35% +6. BiRNN Seq2Seq + Luong Attention + CTC, test accuracy 80.30% +7. CNN RNN + Bahdanau Attention, test accuracy 80.23% +8. Dilated CNN RNN, test accuracy 31.60% +9. Wavenet, test accuracy 75.11% +10. Deep Speech 2, test accuracy 81.40% +11. Wav2Vec Transfer learning BiRNN LSTM, test accuracy 83.24% + +
+ +### [Spelling correction](spelling-correction) + +
Complete list (4 notebooks) + +1. BERT-Base +2. XLNET-Base +3. BERT-Base Fast +4. BERT-Base accurate + +
### [SQUAD Question-Answers](squad-qa) Trained on [SQUAD Dataset](https://rajpurkar.github.io/SQuAD-explorer/). +
Complete list (1 notebooks) + 1. BERT, ```json {"exact_match": 77.57805108798486, "f1": 86.18327335287402} ``` -### [Question-Answers](question-answer) - -Trained on [bAbI Dataset](https://research.fb.com/downloads/babi/). - -1. End-to-End Memory Network + Basic cell -2. End-to-End Memory Network + GRU cell -3. End-to-End Memory Network + LSTM cell -4. Dynamic Memory +
### [Stemming](stemming) -Trained on [English Lemmatization](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/stemming/lemmatization-en.txt). +Trained on [English Lemmatization](stemming/lemmatization-en.txt). + +
Complete list (6 notebooks) 1. LSTM + Seq2Seq + Beam 2. GRU + Seq2Seq + Beam @@ -386,66 +399,138 @@ Trained on [English Lemmatization](https://github.com/huseinzol05/NLP-Models-Ten 5. DNC + Seq2Seq + Greedy 6. BiRNN + Bahdanau + Copynet -### [Abstractive Summarization](abstractive-summarization) +
-Trained on [India news](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/abstractive-summarization/dataset). +### [Text Augmentation](text-augmentation) -Accuracy based on 10 epochs only, calculated using word positions. +
Complete list (8 notebooks) -1. LSTM Seq2Seq using topic modelling, test accuracy 13.22% -2. LSTM Seq2Seq + Luong Attention using topic modelling, test accuracy 12.39% -3. LSTM Seq2Seq + Beam Decoder using topic modelling, test accuracy 10.67% -4. LSTM Bidirectional + Luong Attention + Beam Decoder using topic modelling, test accuracy 8.29% -5. Pointer-Generator + Bahdanau, https://github.com/xueyouluo/my_seq2seq, test accuracy 15.51% -6. Copynet, test accuracy 11.15% -7. Pointer-Generator + Luong, https://github.com/xueyouluo/my_seq2seq, test accuracy 16.51% -8. Dilated Seq2Seq, test accuracy 10.88% -9. Dilated Seq2Seq + Self Attention, test accuracy 11.54% -10. BERT + Dilated Fairseq, test accuracy 13.5% -11. self-attention + Pointer-Generator, test accuracy 4.34% -12. Dilated-Fairseq + Pointer-Generator, test accuracy 5.57% +1. Pretrained Glove +2. GRU VAE-seq2seq-beam TF-probability +3. LSTM VAE-seq2seq-beam TF-probability +4. GRU VAE-seq2seq-beam + Bahdanau Attention TF-probability +5. VAE + Deterministic Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention +6. VAE + VAE Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention +7. BERT-Base + Nucleus Sampling +8. XLNET-Base + Nucleus Sampling -### [Extractive Summarization](extractive-summarization) +
-Trained on [random books](https://github.com/huseinzol05/NLP-Models-Tensorflow/tree/master/extractive-summarization/books). +### [Text classification](text-classification) -1. Skip-thought Vector -2. Residual Network using Atrous CNN -3. Residual Network using Atrous CNN + Bahdanau Attention +Trained on [English sentiment dataset](text-classification/data), accuracy table in [text-classification](text-classification). -### [OCR (optical character recognition)](ocr) +
Complete list (79 notebooks) -1. CNN + LSTM RNN +1. Basic cell RNN +2. Basic cell RNN + Hinge +3. Basic cell RNN + Huber +4. Basic cell Bidirectional RNN +5. Basic cell Bidirectional RNN + Hinge +6. Basic cell Bidirectional RNN + Huber +7. LSTM cell RNN +8. LSTM cell RNN + Hinge +9. LSTM cell RNN + Huber +10. LSTM cell Bidirectional RNN +11. LSTM cell Bidirectional RNN + Huber +12. LSTM cell RNN + Dropout + L2 +13. GRU cell RNN +14. GRU cell RNN + Hinge +15. GRU cell RNN + Huber +16. GRU cell Bidirectional RNN +17. GRU cell Bidirectional RNN + Hinge +18. GRU cell Bidirectional RNN + Huber +19. LSTM RNN + Conv2D +20. K-max Conv1d +21. LSTM RNN + Conv1D + Highway +22. LSTM RNN + Basic Attention +23. LSTM Dilated RNN +24. Layer-Norm LSTM cell RNN +25. Only Attention Neural Network +26. Multihead-Attention Neural Network +27. Neural Turing Machine +28. LSTM Seq2Seq +29. LSTM Seq2Seq + Luong Attention +30. LSTM Seq2Seq + Bahdanau Attention +31. LSTM Seq2Seq + Beam Decoder +32. LSTM Bidirectional Seq2Seq +33. Pointer Net +34. LSTM cell RNN + Bahdanau Attention +35. LSTM cell RNN + Luong Attention +36. LSTM cell RNN + Stack Bahdanau Luong Attention +37. LSTM cell Bidirectional RNN + backward Bahdanau + forward Luong +38. Bytenet +39. Fast-slow LSTM +40. Siamese Network +41. LSTM Seq2Seq + tf.estimator +42. Capsule layers + RNN LSTM +43. Capsule layers + LSTM Seq2Seq +44. Capsule layers + LSTM Bidirectional Seq2Seq +45. Nested LSTM +46. LSTM Seq2Seq + Highway +47. Triplet loss + LSTM +48. DNC (Differentiable Neural Computer) +49. ConvLSTM +50. Temporal Convd Net +51. Batch-all Triplet-loss + LSTM +52. Fast-text +53. Gated Convolution Network +54. Simple Recurrent Unit +55. LSTM Hierarchical Attention Network +56. Bidirectional Transformers +57. Dynamic Memory Network +58. Entity Network +59. End-to-End Memory Network +60. BOW-Chars Deep sparse Network +61. Residual Network using Atrous CNN +62. Residual Network using Atrous CNN + Bahdanau Attention +63. Deep pyramid CNN +64. Transformer-XL +65. Transfer learning GPT-2 345M +66. Quasi-RNN +67. Tacotron +68. Slice GRU +69. Slice GRU + Bahdanau +70. Wavenet +71. Transfer learning BERT Base +72. Transfer learning XL-net Large +73. LSTM BiRNN global Max and average pooling +74. Transfer learning BERT Base drop 6 layers +75. Transfer learning BERT Large drop 12 layers +76. Transfer learning XL-net Base +77. Transfer learning ALBERT +78. Transfer learning ELECTRA Base +79. Transfer learning ELECTRA Large -### [Sentence-pair](sentence-pair) +
-Trained on [Cornell Movie--Dialogs Corpus](https://people.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html) +### [Text Similarity](text-similarity) -1. BERT +Trained on [MNLI](https://cims.nyu.edu/~sbowman/multinli/). -### [Speech to Text](speech-to-text) +
Complete list (10 notebooks) -Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487). +1. BiRNN + Contrastive loss, test accuracy 73.032% +2. BiRNN + Cross entropy, test accuracy 74.265% +3. BiRNN + Circle loss, test accuracy 75.857% +4. BiRNN + Proxy loss, test accuracy 48.37% +5. BERT Base + Cross entropy, test accuracy 91.123% +6. BERT Base + Circle loss, test accuracy 89.903% +7. ELECTRA Base + Cross entropy, test accuracy 96.317% +8. ELECTRA Base + Circle loss, test accuracy 95.603% +9. XLNET Base + Cross entropy, test accuracy 93.998% +10. XLNET Base + Circle loss, test accuracy 94.033% -1. Tacotron, https://github.com/Kyubyong/tacotron_asr -2. Bidirectional RNN + Greedy CTC -3. Bidirectional RNN + Beam CTC -4. Seq2Seq + Bahdanau Attention + Beam CTC -5. Seq2Seq + Luong Attention + Beam CTC -6. Bidirectional RNN + Attention + Beam CTC -7. Wavenet -8. CNN encoder + RNN decoder + Bahdanau Attention -9. CNN encoder + RNN decoder + Luong Attention -10. Dilation CNN + GRU Bidirectional -11. Deep speech 2 -12. Pyramid Dilated CNN +
### [Text to Speech](text-to-speech) Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487). +
Complete list (8 notebooks) + 1. Tacotron, https://github.com/Kyubyong/tacotron -2. Fairseq + Dilated CNN vocoder +2. CNN Seq2seq + Dilated CNN vocoder 3. Seq2Seq + Bahdanau Attention 4. Seq2Seq + Luong Attention 5. Dilated CNN + Monothonic Attention + Dilated CNN vocoder @@ -453,69 +538,90 @@ Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/18 7. Deep CNN + Monothonic Attention + Dilated CNN vocoder 8. Deep CNN + Self Attention + Dilated CNN vocoder -### [Old-to-Young Vocoder](vocoder) - -Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487). - -1. Dilated CNN - -### [Generator](generator) - -Trained on [Shakespeare dataset](https://github.com/huseinzol05/NLP-Models-Tensorflow/blob/master/generator/shakespeare.txt). - -1. Character-wise RNN + LSTM -2. Character-wise RNN + Beam search -3. Character-wise RNN + LSTM + Embedding -4. Word-wise RNN + LSTM -5. Word-wise RNN + LSTM + Embedding -6. Character-wise + Seq2Seq + GRU -7. Word-wise + Seq2Seq + GRU -8. Character-wise RNN + LSTM + Bahdanau Attention -9. Character-wise RNN + LSTM + Luong Attention -10. Word-wise + Seq2Seq + GRU + Beam -11. Character-wise + Seq2Seq + GRU + Bahdanau Attention -12. Word-wise + Seq2Seq + GRU + Bahdanau Attention -13. Character-wise Dilated CNN + Beam search -14. Transformer + Beam search -15. Transformer XL + Beam search +
### [Topic Generator](topic-generator) Trained on [Malaysia news](https://github.com/huseinzol05/Malaya-Dataset/raw/master/news/news.zip). +
Complete list (4 notebooks) + 1. TAT-LSTM 2. TAV-LSTM 3. MTA-LSTM -4. Dilated Fairseq +4. Dilated CNN Seq2seq -### [Language-detection](language-detection) +
-Trained on [Tatoeba dataset](http://downloads.tatoeba.org/exports/sentences.tar.bz2). +### [Topic Modeling](topic-model) -1. Fast-text Char N-Grams +Extracted from [English sentiment dataset](text-classification/data). -### [Text Similarity](text-similarity) +
Complete list (3 notebooks) -Trained on [First Quora Dataset Release: Question Pairs](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs). +1. LDA2Vec +2. BERT Attention +3. XLNET Attention -1. BiRNN + Contrastive loss, test accuracy 76.50% -2. Dilated CNN + Contrastive loss, test accuracy 72.98% -3. Transformer + Contrastive loss, test accuracy 73.48% -4. Dilated CNN + Cross entropy, test accuracy 72.27% -5. Transformer + Cross entropy, test accuracy 71.1% -6. Transfer learning BERT base + Cross entropy, test accuracy 90% +
-### [Text Augmentation](text-augmentation) +### [Unsupervised Extractive Summarization](unsupervised-extractive-summarization) -1. Pretrained Glove -2. GRU VAE-seq2seq-beam TF-probability -3. LSTM VAE-seq2seq-beam TF-probability -4. GRU VAE-seq2seq-beam + Bahdanau Attention TF-probability -5. VAE + Deterministic Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention -6. VAE + VAE Bahdanau Attention, https://github.com/HareeshBahuleyan/tf-var-attention +Trained on [random books](extractive-summarization/books). + +
Complete list (3 notebooks) + +1. Skip-thought Vector +2. Residual Network using Atrous CNN +3. Residual Network using Atrous CNN + Bahdanau Attention + +
+ +### [Vectorizer](vectorizer) + +Trained on [English sentiment dataset](text-classification/data). + +
Complete list (11 notebooks) + +1. Word Vector using CBOW sample softmax +2. Word Vector using CBOW noise contrastive estimation +3. Word Vector using skipgram sample softmax +4. Word Vector using skipgram noise contrastive estimation +5. Supervised Embedded +6. Triplet-loss + LSTM +7. LSTM Auto-Encoder +8. Batch-All Triplet-loss LSTM +9. Fast-text +10. ELMO (biLM) +11. Triplet-loss + BERT + +
+ +### [Visualization](visualization) + +
Complete list (4 notebooks) + +1. Attention heatmap on Bahdanau Attention +2. Attention heatmap on Luong Attention +3. BERT attention, https://github.com/hsm207/bert_attn_viz +4. XLNET attention + +
+ +### [Old-to-Young Vocoder](vocoder) + +Trained on [Toronto speech dataset](https://tspace.library.utoronto.ca/handle/1807/24487). + +
Complete list (1 notebooks) + +1. Dilated CNN + +
### [Attention](attention) +
Complete list (8 notebooks) + 1. Bahdanau 2. Luong 3. Hierarchical @@ -525,12 +631,7 @@ Trained on [First Quora Dataset Release: Question Pairs](https://data.quora.com/ 7. Bahdanau API 8. Luong API -### [Miscellaneous](misc) - -1. Attention heatmap on Bahdanau Attention -2. Attention heatmap on Luong Attention -3. BERT attention, https://github.com/hsm207/bert_attn_viz -4. XLNET attention +
### [Not-deep-learning](not-deep-learning) diff --git a/chatbot/README.md b/chatbot/README.md index ca04b27..ee2e312 100644 --- a/chatbot/README.md +++ b/chatbot/README.md @@ -6,7 +6,7 @@ ## Accuracy, not sorted -Based on 20 epochs accuracy. The results will be different on different dataset. Trained on a GTX 960, 4GB VRAM. +Based on training accuracy for 20 epochs. | name | accuracy | |------------------------------------------------------------|----------| diff --git a/dependency-parser/1.birnn-bahdanau.ipynb b/dependency-parser/1.birnn-bahdanau.ipynb deleted file mode 100644 index b4bd0f1..0000000 --- a/dependency-parser/1.birnn-bahdanau.ipynb +++ /dev/null @@ -1,899 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "from tqdm import tqdm\n", - "import numpy as np" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "with open('test.conll.txt') as fopen:\n", - " corpus = fopen.read().split('\\n')\n", - " \n", - "with open('dev.conll.txt') as fopen:\n", - " corpus_test = fopen.read().split('\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n", - "tag2idx = {'PAD': 0}\n", - "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n", - "word_idx = 3\n", - "tag_idx = 1\n", - "char_idx = 3\n", - "\n", - "def process_corpus(corpus, until = None):\n", - " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", - " sentences, words, depends, labels = [], [], [], []\n", - " temp_sentence, temp_word, temp_depend, temp_label = [], [], [], []\n", - " for sentence in corpus:\n", - " if len(sentence):\n", - " sentence = sentence.split('\\t')\n", - " for c in sentence[1]:\n", - " if c not in char2idx:\n", - " char2idx[c] = char_idx\n", - " char_idx += 1\n", - " if sentence[7] not in tag2idx:\n", - " tag2idx[sentence[7]] = tag_idx\n", - " tag_idx += 1\n", - " if sentence[1] not in word2idx:\n", - " word2idx[sentence[1]] = word_idx\n", - " word_idx += 1\n", - " temp_word.append(word2idx[sentence[1]])\n", - " temp_depend.append(int(sentence[6]))\n", - " temp_label.append(tag2idx[sentence[7]])\n", - " temp_sentence.append(sentence[1])\n", - " else:\n", - " words.append(temp_word)\n", - " depends.append(temp_depend)\n", - " labels.append(temp_label)\n", - " sentences.append(temp_sentence)\n", - " temp_word = []\n", - " temp_depend = []\n", - " temp_label = []\n", - " temp_sentence = []\n", - " return sentences[:-1], words[:-1], depends[:-1], labels[:-1]\n", - " \n", - "sentences, words, depends, labels = process_corpus(corpus)\n", - "sentences_test, words_test, depends_test, labels_test = process_corpus(corpus_test)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using TensorFlow backend.\n" - ] - } - ], - "source": [ - "from keras.preprocessing.sequence import pad_sequences" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "words = pad_sequences(words,padding='post')\n", - "depends = pad_sequences(depends,padding='post')\n", - "labels = pad_sequences(labels,padding='post')\n", - "\n", - "words_test = pad_sequences(words_test,padding='post')\n", - "depends_test = pad_sequences(depends_test,padding='post')\n", - "labels_test = pad_sequences(labels_test,padding='post')" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(1700, 118)" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words_test.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def generate_char_seq(batch, UNK = 2):\n", - " maxlen_c = max([len(k) for k in batch])\n", - " x = [[len(i) for i in k] for k in batch]\n", - " maxlen = max([j for i in x for j in i])\n", - " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n", - " for i in range(len(batch)):\n", - " for k in range(len(batch[i])):\n", - " for no, c in enumerate(batch[i][k]):\n", - " temp[i,k,-1-no] = char2idx.get(c, UNK)\n", - " return temp" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "idx2word = {idx: tag for tag, idx in word2idx.items()}\n", - "idx2tag = {i: w for w, i in tag2idx.items()}\n", - "\n", - "train_X = words\n", - "train_Y = labels\n", - "train_depends = depends\n", - "train_char = generate_char_seq(sentences)\n", - "\n", - "test_X = words_test\n", - "test_Y = labels_test\n", - "test_depends = depends_test\n", - "test_char = generate_char_seq(sentences_test)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " def __init__(\n", - " self,\n", - " dim_word,\n", - " dim_char,\n", - " dropout,\n", - " learning_rate,\n", - " hidden_size_char,\n", - " hidden_size_word,\n", - " num_layers,\n", - " maxlen\n", - " ):\n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " output_keep_prob = dropout,\n", - " )\n", - "\n", - " def bahdanau(embedded, size):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = hidden_size_word, memory = embedded\n", - " )\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = cells(hidden_size_word),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = hidden_size_word,\n", - " )\n", - "\n", - " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n", - " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.maxlen = tf.shape(self.word_ids)[1]\n", - " self.lengths = tf.count_nonzero(self.word_ids, 1)\n", - "\n", - " self.word_embeddings = tf.Variable(\n", - " tf.truncated_normal(\n", - " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n", - " )\n", - " )\n", - " self.char_embeddings = tf.Variable(\n", - " tf.truncated_normal(\n", - " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n", - " )\n", - " )\n", - "\n", - " word_embedded = tf.nn.embedding_lookup(\n", - " self.word_embeddings, self.word_ids\n", - " )\n", - " char_embedded = tf.nn.embedding_lookup(\n", - " self.char_embeddings, self.char_ids\n", - " )\n", - " s = tf.shape(char_embedded)\n", - " char_embedded = tf.reshape(\n", - " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n", - " )\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(hidden_size_char),\n", - " cell_bw = cells(hidden_size_char),\n", - " inputs = char_embedded,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_char_%d' % (n),\n", - " )\n", - " char_embedded = tf.concat((out_fw, out_bw), 2)\n", - " output = tf.reshape(\n", - " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n", - " )\n", - " word_embedded = tf.concat([word_embedded, output], axis = -1)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(word_embedded, hidden_size_word),\n", - " cell_bw = bahdanau(word_embedded, hidden_size_word),\n", - " inputs = word_embedded,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_word_%d' % (n),\n", - " )\n", - " word_embedded = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " logits = tf.layers.dense(word_embedded, len(idx2tag))\n", - " logits_depends = tf.layers.dense(word_embedded, maxlen)\n", - " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n", - " logits, self.labels, self.lengths\n", - " )\n", - " with tf.variable_scope(\"depends\"):\n", - " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n", - " logits_depends, self.depends, self.lengths\n", - " )\n", - " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", - " \n", - " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n", - " logits, transition_params, self.lengths\n", - " )\n", - " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n", - " logits_depends, transition_params_depends, self.lengths\n", - " )\n", - "\n", - " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", - " mask_label = tf.boolean_mask(self.labels, mask)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", - " \n", - " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n", - " mask_label = tf.boolean_mask(self.depends, mask)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "dim_word = 128\n", - "dim_char = 256\n", - "dropout = 1\n", - "learning_rate = 1e-3\n", - "hidden_size_char = 64\n", - "hidden_size_word = 64\n", - "num_layers = 2\n", - "batch_size = 32\n", - "\n", - "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers,\n", - " words.shape[1])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 76/76 [00:43<00:00, 1.90it/s, accuracy=0.123, accuracy_depends=0.116, cost=104] \n", - "test minibatch loop: 100%|██████████| 54/54 [00:16<00:00, 3.61it/s, accuracy=0.136, accuracy_depends=0.0273, cost=168]\n", - "train minibatch loop: 0%| | 0/76 [00:000]" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_y[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 3, 3, 5, 5, 0, 5, 11, 11, 11, 8, 3], dtype=int32)" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "deps[seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_depends[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb b/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb new file mode 100644 index 0000000..fc63018 --- /dev/null +++ b/dependency-parser/1.lstm-birnn-crf-biaffine.ipynb @@ -0,0 +1 @@ +{"cells":[{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true},"cell_type":"code","source":"!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n!pip install malaya -U","execution_count":1,"outputs":[{"output_type":"stream","text":"--2019-09-30 05:05:04-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1668174 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-dev.conllu’\n\nen_ewt-ud-dev.conll 100%[===================>] 1.59M --.-KB/s in 0.05s \n\n2019-09-30 05:05:05 (30.7 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n\n--2019-09-30 05:05:05-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 13303045 (13M) [text/plain]\nSaving to: ‘en_ewt-ud-train.conllu’\n\nen_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.1s \n\n2019-09-30 05:05:06 (102 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n\n--2019-09-30 05:05:06-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1661985 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-test.conllu’\n\nen_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.05s \n\n2019-09-30 05:05:07 (31.0 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n\nCollecting malaya\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n\u001b[K |████████████████████████████████| 2.1MB 4.9MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scipy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.2.1)\nCollecting dateparser (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n\u001b[K |████████████████████████████████| 358kB 43.4MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: sentencepiece in /opt/conda/lib/python3.6/site-packages (from malaya) (0.1.83)\nCollecting bert-tensorflow (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n\u001b[K |████████████████████████████████| 71kB 26.2MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: tensorflow in /opt/conda/lib/python3.6/site-packages (from malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: networkx in /opt/conda/lib/python3.6/site-packages (from malaya) (2.3)\nRequirement already satisfied, skipping upgrade: ftfy in /opt/conda/lib/python3.6/site-packages (from malaya) (5.6)\nRequirement already satisfied, skipping upgrade: xgboost in /opt/conda/lib/python3.6/site-packages (from malaya) (0.90)\nRequirement already satisfied, skipping upgrade: sklearn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.0)\nRequirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.6/site-packages (from malaya) (2.22.0)\nCollecting PySastrawi (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n\u001b[K |████████████████████████████████| 215kB 44.6MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scikit-learn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.21.3)\nRequirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.16.4)\nRequirement already satisfied, skipping upgrade: unidecode in /opt/conda/lib/python3.6/site-packages (from malaya) (1.1.1)\nRequirement already satisfied, skipping upgrade: tzlocal in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.0.0)\nRequirement already satisfied, skipping upgrade: pytz in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.2)\nRequirement already satisfied, skipping upgrade: regex in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.8.19)\nRequirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.8.0)\nRequirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.6/site-packages (from bert-tensorflow->malaya) (1.12.0)\nRequirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.11.2)\nRequirement already satisfied, skipping upgrade: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.33.6)\nRequirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.3.2)\nRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.0.8)\nRequirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.24.0)\nRequirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (3.7.1)\nRequirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: decorator>=4.3.0 in /opt/conda/lib/python3.6/site-packages (from networkx->malaya) (4.4.0)\nRequirement already satisfied, skipping upgrade: wcwidth in /opt/conda/lib/python3.6/site-packages (from ftfy->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (3.0.4)\nRequirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2019.9.11)\nRequirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2.8)\nRequirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (1.24.2)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/lib/python3.6/site-packages (from scikit-learn->malaya) (0.13.2)\nRequirement already satisfied, skipping upgrade: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.9.0)\nRequirement already satisfied, skipping upgrade: setuptools in /opt/conda/lib/python3.6/site-packages (from protobuf>=3.6.1->tensorflow->malaya) (41.2.0)\nRequirement already satisfied, skipping upgrade: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\nRequirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.16.0)\n","name":"stdout"},{"output_type":"stream","text":"Installing collected packages: dateparser, bert-tensorflow, PySastrawi, malaya\nSuccessfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 malaya-2.7.7.0\n","name":"stdout"}]},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import malaya\nimport re\nfrom malaya.texts._text_functions import split_into_sentences\nfrom malaya.texts import _regex\nimport numpy as np\nimport itertools\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing.sequence import pad_sequences\n\ntokenizer = malaya.preprocessing._tokenizer\nsplitter = split_into_sentences","execution_count":2,"outputs":[{"output_type":"stream","text":"not found any version, deleting previous version models..\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def is_number_regex(s):\n if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n return s.isdigit()\n return True\n\ndef preprocessing(w):\n if is_number_regex(w):\n return ''\n elif re.match(_regex._money, w):\n return ''\n elif re.match(_regex._date, w):\n return ''\n elif re.match(_regex._expressions['email'], w):\n return ''\n elif re.match(_regex._expressions['url'], w):\n return ''\n else:\n w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n return w","execution_count":3,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\ntag2idx = {'PAD': 0, '_': 1}\nchar2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\nword_idx = 3\ntag_idx = 2\nchar_idx = 3\n\nspecial_tokens = ['', '', '', '', '']\n\nfor t in special_tokens:\n word2idx[t] = word_idx\n word_idx += 1\n char2idx[t] = char_idx\n char_idx += 1\n \nword2idx, char2idx","execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"({'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7},\n {'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7})"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"PAD = \"_PAD\"\nPAD_POS = \"_PAD_POS\"\nPAD_TYPE = \"_\"\nPAD_CHAR = \"_PAD_CHAR\"\nROOT = \"_ROOT\"\nROOT_POS = \"_ROOT_POS\"\nROOT_TYPE = \"_\"\nROOT_CHAR = \"_ROOT_CHAR\"\nEND = \"_END\"\nEND_POS = \"_END_POS\"\nEND_TYPE = \"_\"\nEND_CHAR = \"_END_CHAR\"\n\ndef process_corpus(corpus, until = None):\n global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n first_time = True\n for sentence in corpus:\n try:\n if len(sentence):\n if sentence[0] == '#':\n continue\n if first_time:\n print(sentence)\n first_time = False\n sentence = sentence.split('\\t')\n for c in sentence[1]:\n if c not in char2idx:\n char2idx[c] = char_idx\n char_idx += 1\n if sentence[7] not in tag2idx:\n tag2idx[sentence[7]] = tag_idx\n tag_idx += 1\n sentence[1] = preprocessing(sentence[1])\n if sentence[1] not in word2idx:\n word2idx[sentence[1]] = word_idx\n word_idx += 1\n temp_word.append(word2idx[sentence[1]])\n temp_depend.append(int(sentence[6]))\n temp_label.append(tag2idx[sentence[7]])\n temp_sentence.append(sentence[1])\n temp_pos.append(sentence[3])\n else:\n if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n continue\n words.append(temp_word)\n depends.append(temp_depend)\n labels.append(temp_label)\n sentences.append( temp_sentence)\n pos.append(temp_pos)\n char_ = [[char2idx['_ROOT']]]\n for w in temp_sentence:\n if w in char2idx:\n char_.append([char2idx[w]])\n else:\n char_.append([char2idx[c] for c in w])\n chars.append(char_)\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n except Exception as e:\n print(e, sentence)\n return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]","execution_count":5,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-dev.conllu') as fopen:\n dev = fopen.read().split('\\n')\n\nsentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)","execution_count":6,"outputs":[{"output_type":"stream","text":"1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\ninvalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\ninvalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-test.conllu') as fopen:\n test = fopen.read().split('\\n')\n\nsentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\nsentences_test.extend(sentences_dev)\nwords_test.extend(words_dev)\ndepends_test.extend(depends_dev)\nlabels_test.extend(labels_dev)","execution_count":7,"outputs":[{"output_type":"stream","text":"1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\ninvalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-train.conllu') as fopen:\n train = fopen.read().split('\\n')\n\nsentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)","execution_count":8,"outputs":[{"output_type":"stream","text":"1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\ninvalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\ninvalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\ninvalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\ninvalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\ninvalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\ninvalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\ninvalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\ninvalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"len(sentences_train), len(sentences_test)","execution_count":9,"outputs":[{"output_type":"execute_result","execution_count":9,"data":{"text/plain":"(12000, 3824)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"idx2word = {v:k for k, v in word2idx.items()}\nidx2tag = {v:k for k, v in tag2idx.items()}\nlen(idx2word)","execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"21974"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def generate_char_seq(batch, UNK = 2):\n maxlen_c = max([len(k) for k in batch])\n x = [[len(i) for i in k] for k in batch]\n maxlen = max([j for i in x for j in i])\n temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n for i in range(len(batch)):\n for k in range(len(batch[i])):\n for no, c in enumerate(batch[i][k]):\n temp[i,k,-1-no] = char2idx.get(c, UNK)\n return temp","execution_count":11,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"generate_char_seq(sentences_train[:5]).shape","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"(5, 36, 11)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"pad_sequences(words_train[:5],padding='post').shape","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"(5, 36)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"train_X = words_train\ntrain_Y = labels_train\ntrain_depends = depends_train\ntrain_char = sentences_train\n\ntest_X = words_test\ntest_Y = labels_test\ntest_depends = depends_test\ntest_char = sentences_test","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"class BiAAttention:\n def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n self.input_size_encoder = input_size_encoder\n self.input_size_decoder = input_size_decoder\n self.num_labels = num_labels\n \n self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n \n def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n batch = tf.shape(input_d)[0]\n length_decoder = tf.shape(input_d)[1]\n length_encoder = tf.shape(input_e)[1]\n out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n \n output = output + out_d + out_e\n \n if mask_d is not None:\n d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n output = output * d * e\n \n return output\n\nclass Model:\n def __init__(\n self,\n dim_word,\n dim_char,\n dropout,\n learning_rate,\n hidden_size_char,\n hidden_size_word,\n num_layers\n ):\n def cells(size, reuse = False):\n return tf.contrib.rnn.DropoutWrapper(\n tf.nn.rnn_cell.LSTMCell(\n size,\n initializer = tf.orthogonal_initializer(),\n reuse = reuse,\n ),\n output_keep_prob = dropout,\n )\n \n self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n self.labels = tf.placeholder(tf.int32, shape = [None, None])\n self.depends = tf.placeholder(tf.int32, shape = [None, None])\n self.maxlen = tf.shape(self.word_ids)[1]\n self.lengths = tf.count_nonzero(self.word_ids, 1)\n self.mask = tf.math.not_equal(self.word_ids, 0)\n float_mask = tf.cast(self.mask, tf.float32)\n \n self.arc_h = tf.layers.Dense(hidden_size_word)\n self.arc_c = tf.layers.Dense(hidden_size_word)\n self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n\n self.word_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n )\n )\n self.char_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n )\n )\n\n word_embedded = tf.nn.embedding_lookup(\n self.word_embeddings, self.word_ids\n )\n char_embedded = tf.nn.embedding_lookup(\n self.char_embeddings, self.char_ids\n )\n s = tf.shape(char_embedded)\n char_embedded = tf.reshape(\n char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n )\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_char),\n cell_bw = cells(hidden_size_char),\n inputs = char_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_char_%d' % (n),\n )\n char_embedded = tf.concat((out_fw, out_bw), 2)\n output = tf.reshape(\n char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n )\n word_embedded = tf.concat([word_embedded, output], axis = -1)\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_word),\n cell_bw = cells(hidden_size_word),\n inputs = word_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_word_%d' % (n),\n )\n word_embedded = tf.concat((out_fw, out_bw), 2)\n\n logits = tf.layers.dense(word_embedded, len(idx2tag))\n log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n logits, self.labels, self.lengths\n )\n arc_h = tf.nn.elu(self.arc_h(word_embedded))\n arc_c = tf.nn.elu(self.arc_c(word_embedded))\n out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n \n batch = tf.shape(out_arc)[0]\n batch_index = tf.range(0, batch)\n max_len = tf.shape(out_arc)[1]\n sec_max_len = tf.shape(out_arc)[2]\n \n minus_inf = -1e8\n minus_mask = (1 - float_mask) * minus_inf\n out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n \n child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n t = tf.transpose(self.depends)\n broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n tf.expand_dims(t, axis = 0),\n tf.expand_dims(child_index, axis = 0)], axis = 0))\n loss_arc = tf.gather_nd(loss_arc, concatenated)\n loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n \n loss_arc = tf.reduce_sum(-loss_arc) / num\n \n self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n \n self.optimizer = tf.train.AdamOptimizer(\n learning_rate = learning_rate\n ).minimize(self.cost)\n \n mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n \n self.tags_seq, _ = tf.contrib.crf.crf_decode(\n logits, transition_params, self.lengths\n )\n \n out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n self.heads = tf.argmax(out_arc, axis = 1)\n \n self.prediction = tf.boolean_mask(self.tags_seq, mask)\n mask_label = tf.boolean_mask(self.labels, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n \n self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n mask_label = tf.boolean_mask(self.depends, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tf.reset_default_graph()\nsess = tf.InteractiveSession()\n\ndim_word = 128\ndim_char = 256\ndropout = 1.0\nlearning_rate = 1e-3\nhidden_size_char = 128\nhidden_size_word = 128\nnum_layers = 2\n\nmodel = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\nsess.run(tf.global_variables_initializer())","execution_count":16,"outputs":[{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"batch_x = train_X[:5]\nbatch_x = pad_sequences(batch_x,padding='post')\nbatch_char = train_char[:5]\nbatch_char = generate_char_seq(batch_char)\nbatch_y = train_Y[:5]\nbatch_y = pad_sequences(batch_y,padding='post')\nbatch_depends = train_depends[:5]\nbatch_depends = pad_sequences(batch_depends,padding='post')","execution_count":17,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"sess.run([model.accuracy, model.accuracy_depends, model.cost],\n feed_dict = {model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends})","execution_count":18,"outputs":[{"output_type":"execute_result","execution_count":18,"data":{"text/plain":"[0.0, 0.00862069, 94.88574]"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from tqdm import tqdm\n\nbatch_size = 32\nepoch = 15\n\nfor e in range(epoch):\n train_acc, train_loss = [], []\n test_acc, test_loss = [], []\n train_acc_depends, test_acc_depends = [], []\n \n pbar = tqdm(\n range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(train_X))\n batch_x = train_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = train_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = train_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = train_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost, _ = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n train_loss.append(cost)\n train_acc.append(acc)\n train_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n pbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n test_loss.append(cost)\n test_acc.append(acc)\n test_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n \n print(\n 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n % (e, np.mean(train_loss), \n np.mean(train_acc), \n np.mean(train_acc_depends), \n np.mean(test_loss), \n np.mean(test_acc), \n np.mean(test_acc_depends)\n ))\n ","execution_count":19,"outputs":[{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:06<00:00, 2.96it/s, accuracy=0.79, accuracy_depends=0.607, cost=17.6] \ntest minibatch loop: 100%|██████████| 120/120 [00:18<00:00, 6.36it/s, accuracy=0.822, accuracy_depends=0.599, cost=10.2]\ntrain minibatch loop: 0%| | 0/375 [00:000]" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_y[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 4, 4, 4, 8, 8, 4, 10, 12, 12, 8, 4], dtype=int32)" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "deps[seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_depends[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb b/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb new file mode 100644 index 0000000..9912dfc --- /dev/null +++ b/dependency-parser/2.lstm-birnn-bahdanau-crf-biaffine.ipynb @@ -0,0 +1,1622 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "version": "3.6.4", + "file_extension": ".py", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "name": "python", + "mimetype": "text/x-python" + }, + "colab": { + "name": "lstm-birnn-bahdanau-crf-biaffine.ipynb", + "provenance": [], + "collapsed_sections": [] + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "code", + "metadata": { + "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5", + "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", + "trusted": true, + "id": "Ljz2IbsWluHv", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "a2204647-1d43-4934-cec4-28a412730e57" + }, + "source": [ + "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n", + "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n", + "!wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n", + "!pip install malaya -U" + ], + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "text": [ + "--2019-09-30 05:12:41-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 1668174 (1.6M) [text/plain]\n", + "Saving to: ‘en_ewt-ud-dev.conllu’\n", + "\n", + "en_ewt-ud-dev.conll 100%[===================>] 1.59M --.-KB/s in 0.01s \n", + "\n", + "2019-09-30 05:12:47 (108 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n", + "\n", + "--2019-09-30 05:12:49-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 13303045 (13M) [text/plain]\n", + "Saving to: ‘en_ewt-ud-train.conllu’\n", + "\n", + "en_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.07s \n", + "\n", + "2019-09-30 05:12:51 (178 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n", + "\n", + "--2019-09-30 05:12:53-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 1661985 (1.6M) [text/plain]\n", + "Saving to: ‘en_ewt-ud-test.conllu’\n", + "\n", + "en_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.03s \n", + "\n", + "2019-09-30 05:12:54 (58.9 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n", + "\n", + "Collecting malaya\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n", + "\u001b[K |████████████████████████████████| 2.1MB 34.6MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: tensorflow in /usr/local/lib/python3.6/dist-packages (from malaya) (1.14.0)\n", + "Collecting sentencepiece (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/14/3d/efb655a670b98f62ec32d66954e1109f403db4d937c50d779a75b9763a29/sentencepiece-0.1.83-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)\n", + "\u001b[K |████████████████████████████████| 1.0MB 42.9MB/s \n", + "\u001b[?25hCollecting PySastrawi (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n", + "\u001b[K |████████████████████████████████| 215kB 54.5MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: sklearn in /usr/local/lib/python3.6/dist-packages (from malaya) (0.0)\n", + "Collecting unidecode (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl (238kB)\n", + "\u001b[K |████████████████████████████████| 245kB 55.3MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: xgboost in /usr/local/lib/python3.6/dist-packages (from malaya) (0.90)\n", + "Requirement already satisfied, skipping upgrade: numpy in /usr/local/lib/python3.6/dist-packages (from malaya) (1.16.5)\n", + "Collecting bert-tensorflow (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n", + "\u001b[K |████████████████████████████████| 71kB 37.1MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: requests in /usr/local/lib/python3.6/dist-packages (from malaya) (2.21.0)\n", + "Requirement already satisfied, skipping upgrade: networkx in /usr/local/lib/python3.6/dist-packages (from malaya) (2.3)\n", + "Collecting dateparser (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n", + "\u001b[K |████████████████████████████████| 358kB 59.1MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: scipy in /usr/local/lib/python3.6/dist-packages (from malaya) (1.3.1)\n", + "Requirement already satisfied, skipping upgrade: scikit-learn in /usr/local/lib/python3.6/dist-packages (from malaya) (0.21.3)\n", + "Collecting ftfy (from malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ca/2d9a5030eaf1bcd925dab392762b9709a7ad4bd486a90599d93cd79cb188/ftfy-5.6.tar.gz (58kB)\n", + "\u001b[K |████████████████████████████████| 61kB 27.2MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.0.8)\n", + "Requirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.11.2)\n", + "Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.33.6)\n", + "Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.2.2)\n", + "Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.15.0)\n", + "Requirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.14.0)\n", + "Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.8.0)\n", + "Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (3.7.1)\n", + "Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.12.0)\n", + "Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.1.0)\n", + "Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.1.0)\n", + "Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (1.14.0)\n", + "Requirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.1.7)\n", + "Requirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow->malaya) (0.8.0)\n", + "Requirement already satisfied, skipping upgrade: urllib3<1.25,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (1.24.3)\n", + "Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (2019.6.16)\n", + "Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (2.8)\n", + "Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->malaya) (3.0.4)\n", + "Requirement already satisfied, skipping upgrade: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx->malaya) (4.4.0)\n", + "Requirement already satisfied, skipping upgrade: python-dateutil in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (2.5.3)\n", + "Collecting regex (from dateparser->malaya)\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6f/a6/99eeb5904ab763db87af4bd71d9b1dfdd9792681240657a4c0a599c10a81/regex-2019.08.19.tar.gz (654kB)\n", + "\u001b[K |████████████████████████████████| 655kB 45.7MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: pytz in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (2018.9)\n", + "Requirement already satisfied, skipping upgrade: tzlocal in /usr/local/lib/python3.6/dist-packages (from dateparser->malaya) (1.5.1)\n", + "Requirement already satisfied, skipping upgrade: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->malaya) (0.13.2)\n", + "Requirement already satisfied, skipping upgrade: wcwidth in /usr/local/lib/python3.6/dist-packages (from ftfy->malaya) (0.1.7)\n", + "Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.8.0)\n", + "Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\n", + "Requirement already satisfied, skipping upgrade: setuptools>=41.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (41.2.0)\n", + "Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.15.6)\n", + "Building wheels for collected packages: ftfy, regex\n", + " Building wheel for ftfy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for ftfy: filename=ftfy-5.6-cp36-none-any.whl size=44553 sha256=a67cd3a8dec5d9ab36f166a19c9d55a545fcf8376d2dd4be7f822f0a7bd433ec\n", + " Stored in directory: /root/.cache/pip/wheels/43/34/ce/cbb38d71543c408de56f3c5e26ce8ba495a0fa5a28eaaf1046\n", + " Building wheel for regex (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for regex: filename=regex-2019.8.19-cp36-cp36m-linux_x86_64.whl size=609237 sha256=98055d96bc0b1d7a1f7761963e06648e4aee0db39cb10bb5ccdb3a451bb447ce\n", + " Stored in directory: /root/.cache/pip/wheels/90/04/07/b5010fb816721eb3d6dd64ed5cc8111ca23f97fdab8619b5be\n", + "Successfully built ftfy regex\n", + "Installing collected packages: sentencepiece, PySastrawi, unidecode, bert-tensorflow, regex, dateparser, ftfy, malaya\n", + "Successfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 ftfy-5.6 malaya-2.7.7.0 regex-2019.8.19 sentencepiece-0.1.83 unidecode-1.1.1\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a", + "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", + "trusted": true, + "id": "r3Uhw481luH7", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "91eb9012-8b77-4c5b-8a3e-b3e11576028d" + }, + "source": [ + "import malaya\n", + "import re\n", + "from malaya.texts._text_functions import split_into_sentences\n", + "from malaya.texts import _regex\n", + "import numpy as np\n", + "import itertools\n", + "import tensorflow as tf\n", + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "tokenizer = malaya.preprocessing._tokenizer\n", + "splitter = split_into_sentences" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "text": [ + "not found any version, deleting previous version models..\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "nrGvtUSEluIC", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def is_number_regex(s):\n", + " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n", + " return s.isdigit()\n", + " return True\n", + "\n", + "def preprocessing(w):\n", + " if is_number_regex(w):\n", + " return ''\n", + " elif re.match(_regex._money, w):\n", + " return ''\n", + " elif re.match(_regex._date, w):\n", + " return ''\n", + " elif re.match(_regex._expressions['email'], w):\n", + " return ''\n", + " elif re.match(_regex._expressions['url'], w):\n", + " return ''\n", + " else:\n", + " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n", + " return w" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "4uJQAfRNluIH", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 289 + }, + "outputId": "c9a41167-3117-49dd-c3f7-95c580e783ec" + }, + "source": [ + "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "tag2idx = {'PAD': 0, '_': 1}\n", + "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "word_idx = 3\n", + "tag_idx = 2\n", + "char_idx = 3\n", + "\n", + "special_tokens = ['', '', '', '', '']\n", + "\n", + "for t in special_tokens:\n", + " word2idx[t] = word_idx\n", + " word_idx += 1\n", + " char2idx[t] = char_idx\n", + " char_idx += 1\n", + " \n", + "word2idx, char2idx" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "({'': 5,\n", + " '': 7,\n", + " '': 4,\n", + " '': 3,\n", + " '': 6,\n", + " 'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2},\n", + " {'': 5,\n", + " '': 7,\n", + " '': 4,\n", + " '': 3,\n", + " '': 6,\n", + " 'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2})" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "HfJNrFwZluIO", + "colab_type": "code", + "colab": {} + }, + "source": [ + "PAD = \"_PAD\"\n", + "PAD_POS = \"_PAD_POS\"\n", + "PAD_TYPE = \"_\"\n", + "PAD_CHAR = \"_PAD_CHAR\"\n", + "ROOT = \"_ROOT\"\n", + "ROOT_POS = \"_ROOT_POS\"\n", + "ROOT_TYPE = \"_\"\n", + "ROOT_CHAR = \"_ROOT_CHAR\"\n", + "END = \"_END\"\n", + "END_POS = \"_END_POS\"\n", + "END_TYPE = \"_\"\n", + "END_CHAR = \"_END_CHAR\"\n", + "\n", + "def process_corpus(corpus, until = None):\n", + " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", + " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n", + " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", + " first_time = True\n", + " for sentence in corpus:\n", + " try:\n", + " if len(sentence):\n", + " if sentence[0] == '#':\n", + " continue\n", + " if first_time:\n", + " print(sentence)\n", + " first_time = False\n", + " sentence = sentence.split('\\t')\n", + " for c in sentence[1]:\n", + " if c not in char2idx:\n", + " char2idx[c] = char_idx\n", + " char_idx += 1\n", + " if sentence[7] not in tag2idx:\n", + " tag2idx[sentence[7]] = tag_idx\n", + " tag_idx += 1\n", + " sentence[1] = preprocessing(sentence[1])\n", + " if sentence[1] not in word2idx:\n", + " word2idx[sentence[1]] = word_idx\n", + " word_idx += 1\n", + " temp_word.append(word2idx[sentence[1]])\n", + " temp_depend.append(int(sentence[6]))\n", + " temp_label.append(tag2idx[sentence[7]])\n", + " temp_sentence.append(sentence[1])\n", + " temp_pos.append(sentence[3])\n", + " else:\n", + " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " continue\n", + " words.append(temp_word)\n", + " depends.append(temp_depend)\n", + " labels.append(temp_label)\n", + " sentences.append( temp_sentence)\n", + " pos.append(temp_pos)\n", + " char_ = [[char2idx['_ROOT']]]\n", + " for w in temp_sentence:\n", + " if w in char2idx:\n", + " char_.append([char2idx[w]])\n", + " else:\n", + " char_.append([char2idx[c] for c in w])\n", + " chars.append(char_)\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " except Exception as e:\n", + " print(e, sentence)\n", + " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "aLFEmcKPluIV", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + }, + "outputId": "8e31626b-91e1-4951-96c2-01341e18ddea" + }, + "source": [ + "with open('en_ewt-ud-dev.conllu') as fopen:\n", + " dev = fopen.read().split('\\n')\n", + "\n", + "sentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "text": [ + "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n", + "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "AHD5Kgh_luIZ", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 71 + }, + "outputId": "dc86ed43-5ce2-4747-dc6a-76e30e9ab2c4" + }, + "source": [ + "with open('en_ewt-ud-test.conllu') as fopen:\n", + " test = fopen.read().split('\\n')\n", + "\n", + "sentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\n", + "sentences_test.extend(sentences_dev)\n", + "words_test.extend(words_dev)\n", + "depends_test.extend(depends_dev)\n", + "labels_test.extend(labels_dev)" + ], + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "text": [ + "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n", + "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "n39ztGEXluIe", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 445 + }, + "outputId": "fd316d0f-b840-4ce3-fefe-edb15017dc93" + }, + "source": [ + "with open('en_ewt-ud-train.conllu') as fopen:\n", + " train = fopen.read().split('\\n')\n", + "\n", + "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "text": [ + "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n", + "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n", + "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n", + "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n", + "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n", + "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n", + "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "RZ8MwuF9luIo", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "28460c3d-86bf-4e15-ef4b-2217734596a2" + }, + "source": [ + "len(sentences_train), len(sentences_test)" + ], + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(12000, 3824)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 9 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "Z7oKPBiMluIx", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "09b7af1e-8ad9-4ede-eb27-720e891dfa5a" + }, + "source": [ + "idx2word = {v:k for k, v in word2idx.items()}\n", + "idx2tag = {v:k for k, v in tag2idx.items()}\n", + "len(idx2word)" + ], + "execution_count": 10, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "21974" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 10 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "EikVfMyQluI2", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def generate_char_seq(batch, UNK = 2):\n", + " maxlen_c = max([len(k) for k in batch])\n", + " x = [[len(i) for i in k] for k in batch]\n", + " maxlen = max([j for i in x for j in i])\n", + " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n", + " for i in range(len(batch)):\n", + " for k in range(len(batch[i])):\n", + " for no, c in enumerate(batch[i][k]):\n", + " temp[i,k,-1-no] = char2idx.get(c, UNK)\n", + " return temp" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "izRVCDaNluI5", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "bd93b03a-0a4b-4eb7-ed14-44876e33ca0d" + }, + "source": [ + "generate_char_seq(sentences_train[:5]).shape" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5, 36, 11)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 12 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "gS8Wlel5luJD", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "fb6df063-31c4-45e4-e590-8220685d4911" + }, + "source": [ + "pad_sequences(words_train[:5],padding='post').shape" + ], + "execution_count": 13, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5, 36)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 13 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "2EKNPE4mluJH", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_X = words_train\n", + "train_Y = labels_train\n", + "train_depends = depends_train\n", + "train_char = sentences_train\n", + "\n", + "test_X = words_test\n", + "test_Y = labels_test\n", + "test_depends = depends_test\n", + "test_char = sentences_test" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "IechxNL3luJW", + "colab_type": "code", + "colab": {} + }, + "source": [ + "class BiAAttention:\n", + " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n", + " self.input_size_encoder = input_size_encoder\n", + " self.input_size_decoder = input_size_decoder\n", + " self.num_labels = num_labels\n", + " \n", + " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n", + " batch = tf.shape(input_d)[0]\n", + " length_decoder = tf.shape(input_d)[1]\n", + " length_encoder = tf.shape(input_e)[1]\n", + " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n", + " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n", + " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n", + " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n", + " \n", + " output = output + out_d + out_e\n", + " \n", + " if mask_d is not None:\n", + " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n", + " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n", + " output = output * d * e\n", + " \n", + " return output\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " dim_word,\n", + " dim_char,\n", + " dropout,\n", + " learning_rate,\n", + " hidden_size_char,\n", + " hidden_size_word,\n", + " num_layers\n", + " ):\n", + " def cells(size, reuse = False):\n", + " return tf.contrib.rnn.DropoutWrapper(\n", + " tf.nn.rnn_cell.LSTMCell(\n", + " size,\n", + " initializer = tf.orthogonal_initializer(),\n", + " reuse = reuse,\n", + " ),\n", + " output_keep_prob = dropout,\n", + " )\n", + " \n", + " def bahdanau(embedded, size):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", + " num_units = hidden_size_word, memory = embedded\n", + " )\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = cells(hidden_size_word),\n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = hidden_size_word,\n", + " )\n", + " \n", + " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n", + " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n", + " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n", + " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n", + " self.maxlen = tf.shape(self.word_ids)[1]\n", + " self.lengths = tf.count_nonzero(self.word_ids, 1)\n", + " self.mask = tf.math.not_equal(self.word_ids, 0)\n", + " float_mask = tf.cast(self.mask, tf.float32)\n", + " \n", + " self.arc_h = tf.layers.Dense(hidden_size_word)\n", + " self.arc_c = tf.layers.Dense(hidden_size_word)\n", + " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n", + "\n", + " self.word_embeddings = tf.Variable(\n", + " tf.truncated_normal(\n", + " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n", + " )\n", + " )\n", + " self.char_embeddings = tf.Variable(\n", + " tf.truncated_normal(\n", + " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n", + " )\n", + " )\n", + "\n", + " word_embedded = tf.nn.embedding_lookup(\n", + " self.word_embeddings, self.word_ids\n", + " )\n", + " char_embedded = tf.nn.embedding_lookup(\n", + " self.char_embeddings, self.char_ids\n", + " )\n", + " s = tf.shape(char_embedded)\n", + " char_embedded = tf.reshape(\n", + " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n", + " )\n", + "\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (\n", + " state_fw,\n", + " state_bw,\n", + " ) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(hidden_size_char),\n", + " cell_bw = cells(hidden_size_char),\n", + " inputs = char_embedded,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_char_%d' % (n),\n", + " )\n", + " char_embedded = tf.concat((out_fw, out_bw), 2)\n", + " output = tf.reshape(\n", + " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n", + " )\n", + " word_embedded = tf.concat([word_embedded, output], axis = -1)\n", + "\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (\n", + " state_fw,\n", + " state_bw,\n", + " ) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = bahdanau(word_embedded, hidden_size_word),\n", + " cell_bw = bahdanau(word_embedded, hidden_size_word),\n", + " inputs = word_embedded,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_word_%d' % (n),\n", + " )\n", + " word_embedded = tf.concat((out_fw, out_bw), 2)\n", + "\n", + " logits = tf.layers.dense(word_embedded, len(idx2tag))\n", + " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n", + " logits, self.labels, self.lengths\n", + " )\n", + " arc_h = tf.nn.elu(self.arc_h(word_embedded))\n", + " arc_c = tf.nn.elu(self.arc_c(word_embedded))\n", + " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " batch_index = tf.range(0, batch)\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " \n", + " minus_inf = -1e8\n", + " minus_mask = (1 - float_mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n", + " num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n", + " \n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(self.depends)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n", + " \n", + " loss_arc = tf.reduce_sum(-loss_arc) / num\n", + " \n", + " self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(\n", + " learning_rate = learning_rate\n", + " ).minimize(self.cost)\n", + " \n", + " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", + " \n", + " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n", + " logits, transition_params, self.lengths\n", + " )\n", + " \n", + " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n", + " self.heads = tf.argmax(out_arc, axis = 1)\n", + " \n", + " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", + " mask_label = tf.boolean_mask(self.labels, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n", + " mask_label = tf.boolean_mask(self.depends, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "ORr-2ouXluJl", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 496 + }, + "outputId": "5c0c29e2-2502-49ce-f641-8973f436b29e" + }, + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "dim_word = 128\n", + "dim_char = 256\n", + "dropout = 1.0\n", + "learning_rate = 1e-3\n", + "hidden_size_char = 128\n", + "hidden_size_word = 128\n", + "num_layers = 2\n", + "\n", + "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\n", + "sess.run(tf.global_variables_initializer())" + ], + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :48: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :107: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :128: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py:99: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :144: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "dim is deprecated, use axis instead\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "4zkpDRaDluJq", + "colab_type": "code", + "colab": {} + }, + "source": [ + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_char = train_char[:5]\n", + "batch_char = generate_char_seq(batch_char)\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "wL67WIkMluJz", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "5a4c71b2-49a3-439c-ed41-be31555eefb3" + }, + "source": [ + "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n", + " feed_dict = {model.word_ids: batch_x,\n", + " model.char_ids: batch_char,\n", + " model.labels: batch_y,\n", + " model.depends: batch_depends})" + ], + "execution_count": 18, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[0.0, 0.094827585, 95.5533]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "trusted": true, + "id": "I0lyT0z-luJ3", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "4bbc5a14-c8ac-4123-a27b-e8a4e4c852a1" + }, + "source": [ + "from tqdm import tqdm\n", + "\n", + "batch_size = 32\n", + "epoch = 15\n", + "\n", + "for e in range(epoch):\n", + " train_acc, train_loss = [], []\n", + " test_acc, test_loss = [], []\n", + " train_acc_depends, test_acc_depends = [], []\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_X))\n", + " batch_x = train_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_char = train_char[i: index]\n", + " batch_char = generate_char_seq(batch_char)\n", + " batch_y = train_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = train_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " acc_depends, acc, cost, _ = sess.run(\n", + " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n", + " feed_dict = {\n", + " model.word_ids: batch_x,\n", + " model.char_ids: batch_char,\n", + " model.labels: batch_y,\n", + " model.depends: batch_depends\n", + " },\n", + " )\n", + " train_loss.append(cost)\n", + " train_acc.append(acc)\n", + " train_acc_depends.append(acc_depends)\n", + " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_char = test_char[i: index]\n", + " batch_char = generate_char_seq(batch_char)\n", + " batch_y = test_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = test_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " acc_depends, acc, cost = sess.run(\n", + " [model.accuracy_depends, model.accuracy, model.cost],\n", + " feed_dict = {\n", + " model.word_ids: batch_x,\n", + " model.char_ids: batch_char,\n", + " model.labels: batch_y,\n", + " model.depends: batch_depends\n", + " },\n", + " )\n", + " test_loss.append(cost)\n", + " test_acc.append(acc)\n", + " test_acc_depends.append(acc_depends)\n", + " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n", + " \n", + " \n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n", + " % (e, np.mean(train_loss), \n", + " np.mean(train_acc), \n", + " np.mean(train_acc_depends), \n", + " np.mean(test_loss), \n", + " np.mean(test_acc), \n", + " np.mean(test_acc_depends)\n", + " ))\n", + " " + ], + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [02:50<00:00, 1.73it/s, accuracy=0.803, accuracy_depends=0.559, cost=16.9]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:21<00:00, 6.03it/s, accuracy=0.862, accuracy_depends=0.636, cost=10.2]\n", + "train minibatch loop: 0%| | 0/375 [00:00] 1.59M --.-KB/s in 0.05s \n\n2019-09-30 05:48:17 (30.8 MB/s) - ‘en_ewt-ud-dev.conllu’ saved [1668174/1668174]\n\n--2019-09-30 05:48:18-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 13303045 (13M) [text/plain]\nSaving to: ‘en_ewt-ud-train.conllu’\n\nen_ewt-ud-train.con 100%[===================>] 12.69M --.-KB/s in 0.1s \n\n2019-09-30 05:48:18 (120 MB/s) - ‘en_ewt-ud-train.conllu’ saved [13303045/13303045]\n\n--2019-09-30 05:48:19-- https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1661985 (1.6M) [text/plain]\nSaving to: ‘en_ewt-ud-test.conllu’\n\nen_ewt-ud-test.conl 100%[===================>] 1.58M --.-KB/s in 0.05s \n\n2019-09-30 05:48:19 (32.0 MB/s) - ‘en_ewt-ud-test.conllu’ saved [1661985/1661985]\n\nCollecting malaya\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/b1/11/5f8ea8da94136d1fb4db39931d4ed55ae51655a3212b33e5bf607271646e/malaya-2.7.7.0-py3-none-any.whl (2.1MB)\n\u001b[K |████████████████████████████████| 2.1MB 4.9MB/s eta 0:00:01\n\u001b[?25hCollecting dateparser (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/82/9d/51126ac615bbc4418478d725a5fa1a0f112059f6f111e4b48cfbe17ef9d0/dateparser-0.7.2-py2.py3-none-any.whl (352kB)\n\u001b[K |████████████████████████████████| 358kB 34.2MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: scikit-learn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.21.3)\nCollecting PySastrawi (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/61/84/b0a5454a040f81e81e6a95a5d5635f20ad43cc0c288f8b4966b339084962/PySastrawi-1.2.0-py2.py3-none-any.whl (210kB)\n\u001b[K |████████████████████████████████| 215kB 42.5MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: unidecode in /opt/conda/lib/python3.6/site-packages (from malaya) (1.1.1)\nRequirement already satisfied, skipping upgrade: scipy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.2.1)\nRequirement already satisfied, skipping upgrade: ftfy in /opt/conda/lib/python3.6/site-packages (from malaya) (5.6)\nRequirement already satisfied, skipping upgrade: sentencepiece in /opt/conda/lib/python3.6/site-packages (from malaya) (0.1.83)\nCollecting bert-tensorflow (from malaya)\n\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)\n\u001b[K |████████████████████████████████| 71kB 27.1MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: sklearn in /opt/conda/lib/python3.6/site-packages (from malaya) (0.0)\nRequirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.6/site-packages (from malaya) (2.22.0)\nRequirement already satisfied, skipping upgrade: numpy in /opt/conda/lib/python3.6/site-packages (from malaya) (1.16.4)\nRequirement already satisfied, skipping upgrade: tensorflow in /opt/conda/lib/python3.6/site-packages (from malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: networkx in /opt/conda/lib/python3.6/site-packages (from malaya) (2.3)\nRequirement already satisfied, skipping upgrade: xgboost in /opt/conda/lib/python3.6/site-packages (from malaya) (0.90)\nRequirement already satisfied, skipping upgrade: tzlocal in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.0.0)\nRequirement already satisfied, skipping upgrade: regex in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.8.19)\nRequirement already satisfied, skipping upgrade: pytz in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2019.2)\nRequirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.6/site-packages (from dateparser->malaya) (2.8.0)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/lib/python3.6/site-packages (from scikit-learn->malaya) (0.13.2)\nRequirement already satisfied, skipping upgrade: wcwidth in /opt/conda/lib/python3.6/site-packages (from ftfy->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: six in /opt/conda/lib/python3.6/site-packages (from bert-tensorflow->malaya) (1.12.0)\nRequirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2019.9.11)\nRequirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (2.8)\nRequirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (1.24.2)\nRequirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->malaya) (3.0.4)\nRequirement already satisfied, skipping upgrade: tensorboard<1.15.0,>=1.14.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.24.0)\nRequirement already satisfied, skipping upgrade: google-pasta>=0.1.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.1.7)\nRequirement already satisfied, skipping upgrade: absl-py>=0.7.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: wheel>=0.26 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.33.6)\nRequirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: gast>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.3.2)\nRequirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (3.7.1)\nRequirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.0.8)\nRequirement already satisfied, skipping upgrade: astor>=0.6.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (0.8.0)\nRequirement already satisfied, skipping upgrade: tensorflow-estimator<1.15.0rc0,>=1.14.0rc0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.14.0)\nRequirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.1.0)\nRequirement already satisfied, skipping upgrade: wrapt>=1.11.1 in /opt/conda/lib/python3.6/site-packages (from tensorflow->malaya) (1.11.2)\nRequirement already satisfied, skipping upgrade: decorator>=4.3.0 in /opt/conda/lib/python3.6/site-packages (from networkx->malaya) (4.4.0)\nRequirement already satisfied, skipping upgrade: markdown>=2.6.8 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (3.1.1)\nRequirement already satisfied, skipping upgrade: setuptools>=41.0.0 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (41.2.0)\nRequirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /opt/conda/lib/python3.6/site-packages (from tensorboard<1.15.0,>=1.14.0->tensorflow->malaya) (0.16.0)\nRequirement already satisfied, skipping upgrade: h5py in /opt/conda/lib/python3.6/site-packages (from keras-applications>=1.0.6->tensorflow->malaya) (2.9.0)\n","name":"stdout"},{"output_type":"stream","text":"Installing collected packages: dateparser, PySastrawi, bert-tensorflow, malaya\nSuccessfully installed PySastrawi-1.2.0 bert-tensorflow-1.0.1 dateparser-0.7.2 malaya-2.7.7.0\n","name":"stdout"}]},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"import malaya\nimport re\nfrom malaya.texts._text_functions import split_into_sentences\nfrom malaya.texts import _regex\nimport numpy as np\nimport itertools\nimport tensorflow as tf\nfrom tensorflow.keras.preprocessing.sequence import pad_sequences\n\ntokenizer = malaya.preprocessing._tokenizer\nsplitter = split_into_sentences","execution_count":2,"outputs":[{"output_type":"stream","text":"not found any version, deleting previous version models..\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def is_number_regex(s):\n if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n return s.isdigit()\n return True\n\ndef preprocessing(w):\n if is_number_regex(w):\n return ''\n elif re.match(_regex._money, w):\n return ''\n elif re.match(_regex._date, w):\n return ''\n elif re.match(_regex._expressions['email'], w):\n return ''\n elif re.match(_regex._expressions['url'], w):\n return ''\n else:\n w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n return w","execution_count":3,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\ntag2idx = {'PAD': 0, '_': 1}\nchar2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\nword_idx = 3\ntag_idx = 2\nchar_idx = 3\n\nspecial_tokens = ['', '', '', '', '']\n\nfor t in special_tokens:\n word2idx[t] = word_idx\n word_idx += 1\n char2idx[t] = char_idx\n char_idx += 1\n \nword2idx, char2idx","execution_count":4,"outputs":[{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"({'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7},\n {'PAD': 0,\n 'UNK': 1,\n '_ROOT': 2,\n '': 3,\n '': 4,\n '': 5,\n '': 6,\n '': 7})"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"PAD = \"_PAD\"\nPAD_POS = \"_PAD_POS\"\nPAD_TYPE = \"_\"\nPAD_CHAR = \"_PAD_CHAR\"\nROOT = \"_ROOT\"\nROOT_POS = \"_ROOT_POS\"\nROOT_TYPE = \"_\"\nROOT_CHAR = \"_ROOT_CHAR\"\nEND = \"_END\"\nEND_POS = \"_END_POS\"\nEND_TYPE = \"_\"\nEND_CHAR = \"_END_CHAR\"\n\ndef process_corpus(corpus, until = None):\n global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n first_time = True\n for sentence in corpus:\n try:\n if len(sentence):\n if sentence[0] == '#':\n continue\n if first_time:\n print(sentence)\n first_time = False\n sentence = sentence.split('\\t')\n for c in sentence[1]:\n if c not in char2idx:\n char2idx[c] = char_idx\n char_idx += 1\n if sentence[7] not in tag2idx:\n tag2idx[sentence[7]] = tag_idx\n tag_idx += 1\n sentence[1] = preprocessing(sentence[1])\n if sentence[1] not in word2idx:\n word2idx[sentence[1]] = word_idx\n word_idx += 1\n temp_word.append(word2idx[sentence[1]])\n temp_depend.append(int(sentence[6]))\n temp_label.append(tag2idx[sentence[7]])\n temp_sentence.append(sentence[1])\n temp_pos.append(sentence[3])\n else:\n if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n continue\n words.append(temp_word)\n depends.append(temp_depend)\n labels.append(temp_label)\n sentences.append( temp_sentence)\n pos.append(temp_pos)\n char_ = [[char2idx['_ROOT']]]\n for w in temp_sentence:\n if w in char2idx:\n char_.append([char2idx[w]])\n else:\n char_.append([char2idx[c] for c in w])\n chars.append(char_)\n temp_word = []\n temp_depend = []\n temp_label = []\n temp_sentence = []\n temp_pos = []\n except Exception as e:\n print(e, sentence)\n return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]","execution_count":5,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-dev.conllu') as fopen:\n dev = fopen.read().split('\\n')\n\nsentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)","execution_count":6,"outputs":[{"output_type":"stream","text":"1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\ninvalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\ninvalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-test.conllu') as fopen:\n test = fopen.read().split('\\n')\n\nsentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\nsentences_test.extend(sentences_dev)\nwords_test.extend(words_dev)\ndepends_test.extend(depends_dev)\nlabels_test.extend(labels_dev)","execution_count":7,"outputs":[{"output_type":"stream","text":"1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\ninvalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"with open('en_ewt-ud-train.conllu') as fopen:\n train = fopen.read().split('\\n')\n\nsentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)","execution_count":8,"outputs":[{"output_type":"stream","text":"1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\ninvalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\ninvalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\ninvalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\ninvalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\ninvalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\ninvalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\ninvalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\ninvalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\ninvalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\ninvalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\ninvalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\ninvalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\ninvalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"len(sentences_train), len(sentences_test)","execution_count":9,"outputs":[{"output_type":"execute_result","execution_count":9,"data":{"text/plain":"(12000, 3824)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"idx2word = {v:k for k, v in word2idx.items()}\nidx2tag = {v:k for k, v in tag2idx.items()}\nlen(idx2word)","execution_count":10,"outputs":[{"output_type":"execute_result","execution_count":10,"data":{"text/plain":"21974"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"def generate_char_seq(batch, UNK = 2):\n maxlen_c = max([len(k) for k in batch])\n x = [[len(i) for i in k] for k in batch]\n maxlen = max([j for i in x for j in i])\n temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n for i in range(len(batch)):\n for k in range(len(batch[i])):\n for no, c in enumerate(batch[i][k]):\n temp[i,k,-1-no] = char2idx.get(c, UNK)\n return temp","execution_count":11,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"generate_char_seq(sentences_train[:5]).shape","execution_count":12,"outputs":[{"output_type":"execute_result","execution_count":12,"data":{"text/plain":"(5, 36, 11)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"pad_sequences(words_train[:5],padding='post').shape","execution_count":13,"outputs":[{"output_type":"execute_result","execution_count":13,"data":{"text/plain":"(5, 36)"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"train_X = words_train\ntrain_Y = labels_train\ntrain_depends = depends_train\ntrain_char = sentences_train\n\ntest_X = words_test\ntest_Y = labels_test\ntest_depends = depends_test\ntest_char = sentences_test","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"class BiAAttention:\n def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n self.input_size_encoder = input_size_encoder\n self.input_size_decoder = input_size_decoder\n self.num_labels = num_labels\n \n self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n initializer=tf.contrib.layers.xavier_initializer())\n \n def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n batch = tf.shape(input_d)[0]\n length_decoder = tf.shape(input_d)[1]\n length_encoder = tf.shape(input_e)[1]\n out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n \n output = output + out_d + out_e\n \n if mask_d is not None:\n d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n output = output * d * e\n \n return output\n\nclass Model:\n def __init__(\n self,\n dim_word,\n dim_char,\n dropout,\n learning_rate,\n hidden_size_char,\n hidden_size_word,\n num_layers\n ):\n def cells(size, reuse = False):\n return tf.contrib.rnn.DropoutWrapper(\n tf.nn.rnn_cell.LSTMCell(\n size,\n initializer = tf.orthogonal_initializer(),\n reuse = reuse,\n ),\n output_keep_prob = dropout,\n )\n \n def luong(embedded, size):\n attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n num_units = hidden_size_word, memory = embedded\n )\n return tf.contrib.seq2seq.AttentionWrapper(\n cell = cells(hidden_size_word),\n attention_mechanism = attention_mechanism,\n attention_layer_size = hidden_size_word,\n )\n \n self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n self.labels = tf.placeholder(tf.int32, shape = [None, None])\n self.depends = tf.placeholder(tf.int32, shape = [None, None])\n self.maxlen = tf.shape(self.word_ids)[1]\n self.lengths = tf.count_nonzero(self.word_ids, 1)\n self.mask = tf.math.not_equal(self.word_ids, 0)\n float_mask = tf.cast(self.mask, tf.float32)\n \n self.arc_h = tf.layers.Dense(hidden_size_word)\n self.arc_c = tf.layers.Dense(hidden_size_word)\n self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n\n self.word_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n )\n )\n self.char_embeddings = tf.Variable(\n tf.truncated_normal(\n [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n )\n )\n\n word_embedded = tf.nn.embedding_lookup(\n self.word_embeddings, self.word_ids\n )\n char_embedded = tf.nn.embedding_lookup(\n self.char_embeddings, self.char_ids\n )\n s = tf.shape(char_embedded)\n char_embedded = tf.reshape(\n char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n )\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = cells(hidden_size_char),\n cell_bw = cells(hidden_size_char),\n inputs = char_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_char_%d' % (n),\n )\n char_embedded = tf.concat((out_fw, out_bw), 2)\n output = tf.reshape(\n char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n )\n word_embedded = tf.concat([word_embedded, output], axis = -1)\n\n for n in range(num_layers):\n (out_fw, out_bw), (\n state_fw,\n state_bw,\n ) = tf.nn.bidirectional_dynamic_rnn(\n cell_fw = luong(word_embedded, hidden_size_word),\n cell_bw = luong(word_embedded, hidden_size_word),\n inputs = word_embedded,\n dtype = tf.float32,\n scope = 'bidirectional_rnn_word_%d' % (n),\n )\n word_embedded = tf.concat((out_fw, out_bw), 2)\n\n logits = tf.layers.dense(word_embedded, len(idx2tag))\n log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n logits, self.labels, self.lengths\n )\n arc_h = tf.nn.elu(self.arc_h(word_embedded))\n arc_c = tf.nn.elu(self.arc_c(word_embedded))\n out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n \n batch = tf.shape(out_arc)[0]\n batch_index = tf.range(0, batch)\n max_len = tf.shape(out_arc)[1]\n sec_max_len = tf.shape(out_arc)[2]\n \n minus_inf = -1e8\n minus_mask = (1 - float_mask) * minus_inf\n out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n \n child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n t = tf.transpose(self.depends)\n broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n tf.expand_dims(t, axis = 0),\n tf.expand_dims(child_index, axis = 0)], axis = 0))\n loss_arc = tf.gather_nd(loss_arc, concatenated)\n loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n \n loss_arc = tf.reduce_sum(-loss_arc) / num\n \n self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n \n self.optimizer = tf.train.AdamOptimizer(\n learning_rate = learning_rate\n ).minimize(self.cost)\n \n mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n \n self.tags_seq, _ = tf.contrib.crf.crf_decode(\n logits, transition_params, self.lengths\n )\n \n out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n self.heads = tf.argmax(out_arc, axis = 1)\n \n self.prediction = tf.boolean_mask(self.tags_seq, mask)\n mask_label = tf.boolean_mask(self.labels, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n \n self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n mask_label = tf.boolean_mask(self.depends, mask)\n correct_pred = tf.equal(self.prediction, mask_label)\n correct_index = tf.cast(correct_pred, tf.float32)\n self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tf.reset_default_graph()\nsess = tf.InteractiveSession()\n\ndim_word = 128\ndim_char = 256\ndropout = 1.0\nlearning_rate = 1e-3\nhidden_size_char = 128\nhidden_size_word = 128\nnum_layers = 2\n\nmodel = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\nsess.run(tf.global_variables_initializer())","execution_count":16,"outputs":[{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"},{"output_type":"stream","text":"WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\nWARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"batch_x = train_X[:5]\nbatch_x = pad_sequences(batch_x,padding='post')\nbatch_char = train_char[:5]\nbatch_char = generate_char_seq(batch_char)\nbatch_y = train_Y[:5]\nbatch_y = pad_sequences(batch_y,padding='post')\nbatch_depends = train_depends[:5]\nbatch_depends = pad_sequences(batch_depends,padding='post')","execution_count":17,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"sess.run([model.accuracy, model.accuracy_depends, model.cost],\n feed_dict = {model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends})","execution_count":18,"outputs":[{"output_type":"execute_result","execution_count":18,"data":{"text/plain":"[0.01724138, 0.03448276, 94.80077]"},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from tqdm import tqdm\n\nbatch_size = 32\nepoch = 15\n\nfor e in range(epoch):\n train_acc, train_loss = [], []\n test_acc, test_loss = [], []\n train_acc_depends, test_acc_depends = [], []\n \n pbar = tqdm(\n range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(train_X))\n batch_x = train_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = train_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = train_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = train_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost, _ = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n train_loss.append(cost)\n train_acc.append(acc)\n train_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n pbar = tqdm(\n range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(test_X))\n batch_x = test_X[i: index]\n batch_x = pad_sequences(batch_x,padding='post')\n batch_char = test_char[i: index]\n batch_char = generate_char_seq(batch_char)\n batch_y = test_Y[i: index]\n batch_y = pad_sequences(batch_y,padding='post')\n batch_depends = test_depends[i: index]\n batch_depends = pad_sequences(batch_depends,padding='post')\n \n acc_depends, acc, cost = sess.run(\n [model.accuracy_depends, model.accuracy, model.cost],\n feed_dict = {\n model.word_ids: batch_x,\n model.char_ids: batch_char,\n model.labels: batch_y,\n model.depends: batch_depends\n },\n )\n test_loss.append(cost)\n test_acc.append(acc)\n test_acc_depends.append(acc_depends)\n pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n \n \n print(\n 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n % (e, np.mean(train_loss), \n np.mean(train_acc), \n np.mean(train_acc_depends), \n np.mean(test_loss), \n np.mean(test_acc), \n np.mean(test_acc_depends)\n ))\n ","execution_count":19,"outputs":[{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 375/375 [02:45<00:00, 2.27it/s, accuracy=0.76, accuracy_depends=0.563, cost=19.7] \ntest minibatch loop: 100%|██████████| 120/120 [00:22<00:00, 5.26it/s, accuracy=0.789, accuracy_depends=0.636, cost=12.2]\ntrain minibatch loop: 0%| | 0/375 [00:000]" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([18, 19, 2, 6, 3, 7, 16, 18, 23, 20, 19, 2], dtype=int32)" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_y[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 14, 11, 5, 6, 0, 3, 11, 11, 11, 6, 6], dtype=int32)" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "deps[seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "array([ 2, 6, 6, 5, 6, 0, 6, 11, 11, 11, 6, 6], dtype=int32)" - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "batch_depends[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/dependency-parser/4.bert-crf-biaffine.ipynb b/dependency-parser/4.bert-crf-biaffine.ipynb new file mode 100644 index 0000000..d38c52d --- /dev/null +++ b/dependency-parser/4.bert-crf-biaffine.ipynb @@ -0,0 +1,1217 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n", + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "tag2idx = {'PAD': 0, 'X': 1}\n", + "tag_idx = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def process_corpus(corpus, until = None):\n", + " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", + " sentences, words, depends, labels, pos, sequences = [], [], [], [], [], []\n", + " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", + " first_time = True\n", + " for sentence in corpus:\n", + " try:\n", + " if len(sentence):\n", + " if sentence[0] == '#':\n", + " continue\n", + " if first_time:\n", + " print(sentence)\n", + " first_time = False\n", + " sentence = sentence.split('\\t')\n", + " if sentence[7] not in tag2idx:\n", + " tag2idx[sentence[7]] = tag_idx\n", + " tag_idx += 1\n", + " temp_word.append(sentence[1])\n", + " temp_depend.append(int(sentence[6]) + 1)\n", + " temp_label.append(tag2idx[sentence[7]])\n", + " temp_sentence.append(sentence[1])\n", + " temp_pos.append(sentence[3])\n", + " else:\n", + " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " continue\n", + " bert_tokens = ['[CLS]']\n", + " labels_ = [0]\n", + " depends_ = [0]\n", + " seq_ = []\n", + " for no, orig_token in enumerate(temp_word):\n", + " labels_.append(temp_label[no])\n", + " depends_.append(temp_depend[no])\n", + " t = tokenizer.tokenize(orig_token)\n", + " bert_tokens.extend(t)\n", + " labels_.extend([1] * (len(t) - 1))\n", + " depends_.extend([0] * (len(t) - 1))\n", + " seq_.append(no + 1)\n", + " bert_tokens.append('[SEP]')\n", + " labels_.append(0)\n", + " depends_.append(0)\n", + " words.append(tokenizer.convert_tokens_to_ids(bert_tokens))\n", + " depends.append(depends_)\n", + " labels.append(labels_)\n", + " sentences.append(temp_sentence)\n", + " pos.append(temp_pos)\n", + " sequences.append(seq_)\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " except Exception as e:\n", + " print(e, sentence)\n", + " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], sequences[:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n", + "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-dev.conllu') as fopen:\n", + " dev = fopen.read().split('\\n')\n", + "\n", + "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev = process_corpus(dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "([101, 1622, 1103, 10997, 2502, 1142, 1642, 131, 102],\n", + " [0, 4, 4, 5, 1, 7, 5, 5, 0],\n", + " [1, 2, 3, 4, 5, 6, 7])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "words_dev[0], depends_dev[0], seq_dev[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n", + "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-test.conllu') as fopen:\n", + " test = fopen.read().split('\\n')\n", + "\n", + "sentences_test, words_test, depends_test, labels_test, _, seq_test = process_corpus(test)\n", + "sentences_test.extend(sentences_dev)\n", + "words_test.extend(words_dev)\n", + "depends_test.extend(depends_dev)\n", + "labels_test.extend(labels_dev)\n", + "seq_test.extend(seq_dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n", + "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n", + "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n", + "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n", + "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n", + "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n", + "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-train.conllu') as fopen:\n", + " train = fopen.read().split('\\n')\n", + "\n", + "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(12000, 3824)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sentences_train), len(sentences_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "idx2tag = {v:k for k, v in tag2idx.items()}" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = words_train\n", + "train_Y = labels_train\n", + "train_depends = depends_train\n", + "\n", + "test_X = words_test\n", + "test_Y = labels_test\n", + "test_depends = depends_test" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "epoch = 15\n", + "batch_size = 32\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(train_X) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)\n", + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "class BiAAttention:\n", + " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n", + " self.input_size_encoder = input_size_encoder\n", + " self.input_size_decoder = input_size_decoder\n", + " self.num_labels = num_labels\n", + " \n", + " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n", + " batch = tf.shape(input_d)[0]\n", + " length_decoder = tf.shape(input_d)[1]\n", + " length_encoder = tf.shape(input_e)[1]\n", + " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n", + " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n", + " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n", + " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n", + " \n", + " output = output + out_d + out_e\n", + " \n", + " if mask_d is not None:\n", + " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n", + " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n", + " output = output * d * e\n", + " \n", + " return output\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " learning_rate,\n", + " hidden_size_word,\n", + " ):\n", + " def cells(size, reuse = False):\n", + " return tf.contrib.rnn.DropoutWrapper(\n", + " tf.nn.rnn_cell.LSTMCell(\n", + " size,\n", + " initializer = tf.orthogonal_initializer(),\n", + " reuse = reuse,\n", + " ),\n", + " output_keep_prob = dropout,\n", + " )\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n", + " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n", + " self.maxlen = tf.shape(self.X)[1]\n", + " self.lengths = tf.count_nonzero(self.X, 1)\n", + " self.mask = tf.math.not_equal(self.X, 0)\n", + " float_mask = tf.cast(self.mask, tf.float32)\n", + " \n", + " self.arc_h = tf.layers.Dense(hidden_size_word)\n", + " self.arc_c = tf.layers.Dense(hidden_size_word)\n", + " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n", + "\n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=True,\n", + " input_ids=self.X,\n", + " use_one_hot_embeddings=False)\n", + " output_layer = model.get_sequence_output()\n", + "\n", + " logits = tf.layers.dense(output_layer, len(idx2tag))\n", + " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n", + " logits, self.labels, self.lengths\n", + " )\n", + " arc_h = tf.nn.elu(self.arc_h(output_layer))\n", + " arc_c = tf.nn.elu(self.arc_c(output_layer))\n", + " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=float_mask, mask_e=float_mask), axis = 1)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " batch_index = tf.range(0, batch)\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " \n", + " minus_inf = -1e8\n", + " minus_mask = (1 - float_mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_arc = loss_arc * tf.expand_dims(float_mask, axis = 2) * tf.expand_dims(float_mask, axis = 1)\n", + " num = tf.reduce_sum(float_mask) - tf.cast(batch, tf.float32)\n", + " \n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(self.depends)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])[1:]\n", + " \n", + " loss_arc = tf.reduce_sum(-loss_arc) / num\n", + " \n", + " self.cost = tf.reduce_mean(-log_likelihood) + loss_arc\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " \n", + " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", + " \n", + " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n", + " logits, transition_params, self.lengths\n", + " )\n", + " \n", + " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1.0 - float_mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n", + " self.heads = tf.argmax(out_arc, axis = 1)\n", + " \n", + " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", + " mask_label = tf.boolean_mask(self.labels, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " self.prediction = tf.cast(tf.boolean_mask(self.heads, mask), tf.int32)\n", + " mask_label = tf.boolean_mask(self.depends, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:99: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:213: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From :83: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "dim is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Deprecated in favor of operator or tf.math.divide.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "hidden_size_word = 128\n", + "learning_rate = 2e-5\n", + "\n", + "model = Model(learning_rate,hidden_size_word)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.028169014, 0.03521127, 124.20428]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n", + " feed_dict = {model.X: batch_x,\n", + " model.labels: batch_y,\n", + " model.depends: batch_depends})" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:38<00:00, 3.82it/s, accuracy=0.925, accuracy_depends=0.278, cost=10.6] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:12<00:00, 9.31it/s, accuracy=0.945, accuracy_depends=0.377, cost=6.01]\n", + "train minibatch loop: 0%| | 0/375 [00:000]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "batch_y[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "deps[seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "batch_depends[0][seq>0]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/dependency-parser/5.attention-is-all-you-need.ipynb b/dependency-parser/5.attention-is-all-you-need.ipynb deleted file mode 100644 index a5baba0..0000000 --- a/dependency-parser/5.attention-is-all-you-need.ipynb +++ /dev/null @@ -1,1743 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "from tqdm import tqdm\n", - "import numpy as np\n", - "import re" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "with open('id_gsd-ud-train.conllu.txt') as fopen:\n", - " corpus = fopen.read().split('\\n')\n", - " \n", - "with open('id_gsd-ud-test.conllu.txt') as fopen:\n", - " corpus.extend(fopen.read().split('\\n'))\n", - " \n", - "with open('id_gsd-ud-dev.conllu.txt') as fopen:\n", - " corpus.extend(fopen.read().split('\\n'))" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "word2idx = {'PAD': 0,'NUM':1,'UNK':2}\n", - "tag2idx = {'PAD': 0}\n", - "char2idx = {'PAD': 0,'NUM':1,'UNK':2}\n", - "word_idx = 3\n", - "tag_idx = 1\n", - "char_idx = 3\n", - "\n", - "def process_string(string):\n", - " string = re.sub('[^A-Za-z0-9\\-\\/ ]+', ' ', string).split()\n", - " return [to_title(y.strip()) for y in string]\n", - "\n", - "def to_title(string):\n", - " if string.isupper():\n", - " string = string.title()\n", - " return string\n", - "\n", - "def process_corpus(corpus, until = None):\n", - " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", - " sentences, words, depends, labels, pos = [], [], [], [], []\n", - " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", - " for sentence in corpus:\n", - " if len(sentence):\n", - " if sentence[0] == '#':\n", - " continue\n", - " sentence = sentence.split('\\t')\n", - " temp = process_string(sentence[1])\n", - " if not len(temp):\n", - " sentence[1] = 'EMPTY'\n", - " sentence[1] = process_string(sentence[1])[0]\n", - " for c in sentence[1]:\n", - " if c not in char2idx:\n", - " char2idx[c] = char_idx\n", - " char_idx += 1\n", - " if sentence[7] not in tag2idx:\n", - " tag2idx[sentence[7]] = tag_idx\n", - " tag_idx += 1\n", - " if sentence[1] not in word2idx:\n", - " word2idx[sentence[1]] = word_idx\n", - " word_idx += 1\n", - " temp_word.append(word2idx[sentence[1]])\n", - " temp_depend.append(int(sentence[6]) + 1)\n", - " temp_label.append(tag2idx[sentence[7]])\n", - " temp_sentence.append(sentence[1])\n", - " temp_pos.append(sentence[3])\n", - " else:\n", - " words.append(temp_word)\n", - " depends.append(temp_depend)\n", - " labels.append(temp_label)\n", - " sentences.append(temp_sentence)\n", - " pos.append(temp_pos)\n", - " temp_word = []\n", - " temp_depend = []\n", - " temp_label = []\n", - " temp_sentence = []\n", - " temp_pos = []\n", - " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1]\n", - " \n", - "sentences, words, depends, labels, pos = process_corpus(corpus)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "with open('augmented.json') as fopen:\n", - " augmented = json.load(fopen)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def parse_XY(texts):\n", - " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", - " outside, sentences = [], []\n", - " for no, text in enumerate(texts):\n", - " s = process_string(text)\n", - " sentences.append(s)\n", - " inside = []\n", - " for w in s:\n", - " for c in w:\n", - " if c not in char2idx:\n", - " char2idx[c] = char_idx\n", - " char_idx += 1\n", - " \n", - " if w not in word2idx:\n", - " word2idx[w] = word_idx\n", - " word_idx += 1\n", - " \n", - " inside.append(word2idx[w])\n", - " outside.append(inside)\n", - " return outside, sentences" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "text_augmented = []\n", - "for a in augmented:\n", - " text_augmented.extend(a[0])\n", - " depends.extend(a[1])\n", - " labels.extend(a[2])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "outside, new_sentences = parse_XY(text_augmented)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using TensorFlow backend.\n" - ] - } - ], - "source": [ - "from keras.preprocessing.sequence import pad_sequences" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "words.extend(outside)\n", - "sentences.extend(new_sentences)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(50365, 50365, 50365, 50365)" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(words), len(depends), len(labels), len(sentences)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def generate_char_seq(batch, UNK = 2):\n", - " maxlen_c = max([len(k) for k in batch])\n", - " x = [[len(i) for i in k] for k in batch]\n", - " maxlen = max([j for i in x for j in i])\n", - " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n", - " for i in range(len(batch)):\n", - " for k in range(len(batch[i])):\n", - " for no, c in enumerate(batch[i][k][:maxlen][::-1]):\n", - " temp[i,k,-1-no] = char2idx.get(c, UNK)\n", - " return temp" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "idx2word = {idx: tag for tag, idx in word2idx.items()}\n", - "idx2tag = {i: w for w, i in tag2idx.items()}\n", - "char = generate_char_seq(sentences)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(50365, 189)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "words = pad_sequences(words,padding='post')\n", - "depends = pad_sequences(depends,padding='post')\n", - "labels = pad_sequences(labels,padding='post')\n", - "words.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", - " \"This module will be removed in 0.20.\", DeprecationWarning)\n" - ] - } - ], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "train_X, test_X, train_Y, test_Y, train_depends, test_depends, train_char, test_char = train_test_split(\n", - " words,\n", - " labels,\n", - " depends,\n", - " char,\n", - " test_size=0.1)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - "\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " \n", - " outputs = gamma * normalized + beta\n", - " return outputs\n", - "\n", - "def multihead_attn(queries, keys, q_masks, k_masks, future_binding, num_units, num_heads):\n", - " \n", - " T_q = tf.shape(queries)[1] \n", - " T_k = tf.shape(keys)[1] \n", - "\n", - " Q = tf.layers.dense(queries, num_units, name='Q') \n", - " K_V = tf.layers.dense(keys, 2*num_units, name='K_V') \n", - " K, V = tf.split(K_V, 2, -1) \n", - "\n", - " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) \n", - " K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) \n", - " V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) \n", - "\n", - " align = tf.matmul(Q_, tf.transpose(K_, [0,2,1])) \n", - " align = align / np.sqrt(K_.get_shape().as_list()[-1]) \n", - "\n", - " paddings = tf.fill(tf.shape(align), 0.0) \n", - "\n", - " key_masks = k_masks \n", - " key_masks = tf.tile(key_masks, [num_heads, 1]) \n", - " key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, T_q, 1]) \n", - " align = tf.where(tf.equal(key_masks, 0), paddings, align) \n", - "\n", - " if future_binding:\n", - " lower_tri = tf.ones([T_q, T_k]) \n", - " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense() \n", - " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0], 1, 1]) \n", - " align = tf.where(tf.equal(masks, 0), paddings, align) \n", - " \n", - " align = tf.nn.softmax(align) \n", - " query_masks = tf.to_float(q_masks) \n", - " query_masks = tf.tile(query_masks, [num_heads, 1]) \n", - " query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, T_k]) \n", - " align *= query_masks\n", - " outputs = tf.matmul(align, V_) \n", - " outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) \n", - " outputs += queries \n", - " outputs = layer_norm(outputs) \n", - " return outputs\n", - "\n", - "\n", - "def pointwise_feedforward(inputs, hidden_units, activation=None):\n", - " outputs = tf.layers.dense(inputs, 4*hidden_units, activation=activation)\n", - " outputs = tf.layers.dense(outputs, hidden_units, activation=None)\n", - " outputs += inputs\n", - " outputs = layer_norm(outputs)\n", - " return outputs\n", - "\n", - "\n", - "def learned_position_encoding(inputs, mask, embed_dim):\n", - " T = tf.shape(inputs)[1]\n", - " outputs = tf.range(tf.shape(inputs)[1]) # (T_q)\n", - " outputs = tf.expand_dims(outputs, 0) # (1, T_q)\n", - " outputs = tf.tile(outputs, [tf.shape(inputs)[0], 1]) # (N, T_q)\n", - " outputs = embed_seq(outputs, T, embed_dim, zero_pad=False, scale=False)\n", - " return tf.expand_dims(tf.to_float(mask), -1) * outputs\n", - "\n", - "\n", - "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n", - " T = tf.shape(inputs)[1]\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n", - "\n", - "def label_smoothing(inputs, epsilon=0.1):\n", - " C = inputs.get_shape().as_list()[-1]\n", - " return ((1 - epsilon) * inputs) + (epsilon / C)\n", - "\n", - "\n", - "class CRF:\n", - " def __init__(self,\n", - " dim_word,\n", - " dim_char,\n", - " dropout,\n", - " learning_rate,\n", - " hidden_size_char,\n", - " hidden_size_word,\n", - " maxlen,\n", - " num_blocks = 2,\n", - " num_heads = 8,\n", - " min_freq = 50):\n", - " \n", - " self.word_ids = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.char_ids = tf.placeholder(tf.int32, shape = [None, None, None])\n", - " self.labels = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.depends = tf.placeholder(tf.int32, shape = [None, None])\n", - " self.maxlen = tf.shape(self.word_ids)[1]\n", - " self.lengths = tf.count_nonzero(self.word_ids, 1)\n", - " batch_size = tf.shape(self.word_ids)[0]\n", - " \n", - " self.word_embeddings = tf.Variable(\n", - " tf.truncated_normal(\n", - " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n", - " )\n", - " )\n", - " self.char_embeddings = tf.Variable(\n", - " tf.truncated_normal(\n", - " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n", - " )\n", - " )\n", - " \n", - " word_embedded = tf.nn.embedding_lookup(\n", - " self.word_embeddings, self.word_ids\n", - " )\n", - " char_embedded = tf.nn.embedding_lookup(\n", - " self.char_embeddings, self.char_ids\n", - " )\n", - " s = tf.shape(char_embedded)\n", - " char_embedded = tf.reshape(\n", - " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n", - " )\n", - " reshape_char = tf.reshape(self.char_ids, shape = [s[0] * s[1], s[-2]])\n", - " char_masked = tf.sign(reshape_char)\n", - " char_embedded += sinusoidal_position_encoding(reshape_char, char_masked, dim_char)\n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('char_%d'%i,reuse=tf.AUTO_REUSE):\n", - " char_embedded = multihead_attn(queries = char_embedded,\n", - " keys = char_embedded,\n", - " q_masks = char_masked,\n", - " k_masks = char_masked,\n", - " future_binding = False,\n", - " num_units = dim_char,\n", - " num_heads = num_heads)\n", - " with tf.variable_scope('char_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " char_embedded = pointwise_feedforward(char_embedded,\n", - " dim_char,\n", - " activation = tf.nn.relu)\n", - " output = tf.reshape(\n", - " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n", - " )\n", - " \n", - " decoder_embedded = tf.concat([word_embedded, output], axis = -1)\n", - " decoder_embedded = tf.layers.dense(word_embedded, dim_char)\n", - " de_masks = tf.sign(self.word_ids)\n", - " \n", - " decoder_embedded += sinusoidal_position_encoding(self.word_ids, de_masks, dim_char)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('word_char_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = decoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = True,\n", - " num_units = dim_char,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('word_char_attention_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = output,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = False,\n", - " num_units = dim_char,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('word_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = pointwise_feedforward(decoder_embedded,\n", - " dim_char,\n", - " activation = tf.nn.relu)\n", - " \n", - " logits = tf.layers.dense(decoder_embedded, len(idx2tag))\n", - " \n", - " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n", - " logits, self.labels, self.lengths\n", - " )\n", - " \n", - " tag_embeddings = tf.Variable(\n", - " tf.truncated_normal(\n", - " [len(idx2tag), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n", - " )\n", - " )\n", - " logits_max = tf.argmax(logits,axis=2,output_type=tf.int32)\n", - " lookup_logits = tf.nn.embedding_lookup(\n", - " tag_embeddings, logits_max\n", - " )\n", - " \n", - " lookup_logits += sinusoidal_position_encoding(logits_max, de_masks, dim_char)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('depend_%d'%i,reuse=tf.AUTO_REUSE):\n", - " lookup_logits = multihead_attn(queries = lookup_logits,\n", - " keys = lookup_logits,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = True,\n", - " num_units = dim_char,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('depend_attention_%d'%i,reuse=tf.AUTO_REUSE):\n", - " lookup_logits = multihead_attn(queries = lookup_logits,\n", - " keys = decoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = False,\n", - " num_units = dim_char,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('depend_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " lookup_logits = pointwise_feedforward(lookup_logits,\n", - " dim_char,\n", - " activation = tf.nn.relu)\n", - " \n", - " cast_mask = tf.cast(tf.sequence_mask(self.lengths + 1, maxlen = maxlen), dtype = tf.float32)\n", - " cast_mask = tf.tile(tf.expand_dims(cast_mask,axis=1),[1,self.maxlen,1]) * 10\n", - " \n", - " logits_depends = tf.layers.dense(lookup_logits, maxlen)\n", - " logits_depends = tf.multiply(logits_depends, cast_mask)\n", - " \n", - " with tf.variable_scope(\"depends\"):\n", - " log_likelihood_depends, transition_params_depends = tf.contrib.crf.crf_log_likelihood(\n", - " logits_depends, self.depends, self.lengths\n", - " )\n", - " \n", - " self.cost = tf.reduce_mean(-log_likelihood) + tf.reduce_mean(-log_likelihood_depends)\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", - " \n", - " self.tags_seq, _ = tf.contrib.crf.crf_decode(\n", - " logits, transition_params, self.lengths\n", - " )\n", - " self.tags_seq = tf.identity(self.tags_seq, name = 'logits')\n", - " \n", - " self.tags_seq_depends, _ = tf.contrib.crf.crf_decode(\n", - " logits_depends, transition_params_depends, self.lengths\n", - " )\n", - " self.tags_seq_depends = tf.identity(self.tags_seq_depends, name = 'logits_depends')\n", - "\n", - " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", - " mask_label = tf.boolean_mask(self.labels, mask)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", - " \n", - " self.prediction = tf.boolean_mask(self.tags_seq_depends, mask)\n", - " mask_label = tf.boolean_mask(self.depends, mask)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "dim_word = 128\n", - "dim_char = 256\n", - "dropout = 0.8\n", - "learning_rate = 1e-3\n", - "hidden_size_char = 128\n", - "hidden_size_word = 64\n", - "batch_size = 8\n", - "\n", - "model = CRF(dim_word = dim_word,\n", - " dim_char = dim_char,\n", - " dropout = dropout,\n", - " learning_rate = learning_rate,\n", - " hidden_size_char = hidden_size_char,\n", - " hidden_size_word = hidden_size_word,\n", - " maxlen = words.shape[1])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 5666/5666 [1:03:18<00:00, 1.51it/s, accuracy=0.756, accuracy_depends=0.524, cost=51.9] \n", - "test minibatch loop: 100%|██████████| 630/630 [02:52<00:00, 4.17it/s, accuracy=0.707, accuracy_depends=0.515, cost=53.7]\n", - "train minibatch loop: 0%| | 0/5666 [00:00'\n", + " elif re.match(_regex._money, w):\n", + " return ''\n", + " elif re.match(_regex._date, w):\n", + " return ''\n", + " elif re.match(_regex._expressions['email'], w):\n", + " return ''\n", + " elif re.match(_regex._expressions['url'], w):\n", + " return ''\n", + " else:\n", + " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n", + " return w" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "({'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2,\n", + " '': 3,\n", + " '': 4,\n", + " '': 5,\n", + " '': 6,\n", + " '': 7},\n", + " {'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2,\n", + " '': 3,\n", + " '': 4,\n", + " '': 5,\n", + " '': 6,\n", + " '': 7})" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "tag2idx = {'PAD': 0, '_': 1}\n", + "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "word_idx = 3\n", + "tag_idx = 2\n", + "char_idx = 3\n", + "\n", + "special_tokens = ['', '', '', '', '']\n", + "\n", + "for t in special_tokens:\n", + " word2idx[t] = word_idx\n", + " word_idx += 1\n", + " char2idx[t] = char_idx\n", + " char_idx += 1\n", + " \n", + "word2idx, char2idx" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "PAD = \"_PAD\"\n", + "PAD_POS = \"_PAD_POS\"\n", + "PAD_TYPE = \"_\"\n", + "PAD_CHAR = \"_PAD_CHAR\"\n", + "ROOT = \"_ROOT\"\n", + "ROOT_POS = \"_ROOT_POS\"\n", + "ROOT_TYPE = \"_\"\n", + "ROOT_CHAR = \"_ROOT_CHAR\"\n", + "END = \"_END\"\n", + "END_POS = \"_END_POS\"\n", + "END_TYPE = \"_\"\n", + "END_CHAR = \"_END_CHAR\"\n", + "\n", + "def process_corpus(corpus, until = None):\n", + " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", + " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n", + " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", + " first_time = True\n", + " for sentence in corpus:\n", + " try:\n", + " if len(sentence):\n", + " if sentence[0] == '#':\n", + " continue\n", + " if first_time:\n", + " print(sentence)\n", + " first_time = False\n", + " sentence = sentence.split('\\t')\n", + " for c in sentence[1]:\n", + " if c not in char2idx:\n", + " char2idx[c] = char_idx\n", + " char_idx += 1\n", + " if sentence[7] not in tag2idx:\n", + " tag2idx[sentence[7]] = tag_idx\n", + " tag_idx += 1\n", + " sentence[1] = preprocessing(sentence[1])\n", + " if sentence[1] not in word2idx:\n", + " word2idx[sentence[1]] = word_idx\n", + " word_idx += 1\n", + " temp_word.append(word2idx[sentence[1]])\n", + " temp_depend.append(int(sentence[6]))\n", + " temp_label.append(tag2idx[sentence[7]])\n", + " temp_sentence.append(sentence[1])\n", + " temp_pos.append(sentence[3])\n", + " else:\n", + " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " continue\n", + " words.append(temp_word)\n", + " depends.append(temp_depend)\n", + " labels.append(temp_label)\n", + " sentences.append( temp_sentence)\n", + " pos.append(temp_pos)\n", + " char_ = [[char2idx['_ROOT']]]\n", + " for w in temp_sentence:\n", + " if w in char2idx:\n", + " char_.append([char2idx[w]])\n", + " else:\n", + " char_.append([char2idx[c] for c in w])\n", + " chars.append(char_)\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " except Exception as e:\n", + " print(e, sentence)\n", + " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n", + "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-dev.conllu') as fopen:\n", + " dev = fopen.read().split('\\n')\n", + "\n", + "sentences_dev, words_dev, depends_dev, labels_dev, _, _ = process_corpus(dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n", + "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-test.conllu') as fopen:\n", + " test = fopen.read().split('\\n')\n", + "\n", + "sentences_test, words_test, depends_test, labels_test, _, _ = process_corpus(test)\n", + "sentences_test.extend(sentences_dev)\n", + "words_test.extend(words_dev)\n", + "depends_test.extend(depends_dev)\n", + "labels_test.extend(labels_dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n", + "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n", + "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n", + "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n", + "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n", + "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n", + "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-train.conllu') as fopen:\n", + " train = fopen.read().split('\\n')\n", + "\n", + "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(12000, 3824)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sentences_train), len(sentences_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "21974" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "idx2word = {v:k for k, v in word2idx.items()}\n", + "idx2tag = {v:k for k, v in tag2idx.items()}\n", + "len(idx2word)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_char_seq(batch, UNK = 2):\n", + " maxlen_c = max([len(k) for k in batch])\n", + " x = [[len(i) for i in k] for k in batch]\n", + " maxlen = max([j for i in x for j in i])\n", + " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n", + " for i in range(len(batch)):\n", + " for k in range(len(batch[i])):\n", + " for no, c in enumerate(batch[i][k]):\n", + " temp[i,k,-1-no] = char2idx.get(c, UNK)\n", + " return temp" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = words_train\n", + "train_Y = labels_train\n", + "train_depends = depends_train\n", + "train_char = sentences_train\n", + "\n", + "test_X = words_test\n", + "test_Y = labels_test\n", + "test_depends = depends_test\n", + "test_char = sentences_test" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "class BiAAttention:\n", + " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n", + " self.input_size_encoder = input_size_encoder\n", + " self.input_size_decoder = input_size_decoder\n", + " self.num_labels = num_labels\n", + " \n", + " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n", + " batch = tf.shape(input_d)[0]\n", + " length_decoder = tf.shape(input_d)[1]\n", + " length_encoder = tf.shape(input_e)[1]\n", + " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n", + " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n", + " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n", + " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n", + " \n", + " output = output + out_d + out_e\n", + " \n", + " if mask_d is not None:\n", + " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n", + " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n", + " output = output * d * e\n", + " \n", + " return output\n", + " \n", + "class BiLinear:\n", + " def __init__(self, left_features, right_features, out_features):\n", + " self.left_features = left_features\n", + " self.right_features = right_features\n", + " self.out_features = out_features\n", + " \n", + " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_left, input_right):\n", + " left_size = tf.shape(input_left)\n", + " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n", + " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n", + " input_left = tf.reshape(input_left, (batch, self.left_features))\n", + " input_right = tf.reshape(input_right, (batch, self.right_features))\n", + " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n", + " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n", + " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n", + " + tf.matmul(input_right, tf.transpose(self.W_r))\n", + " \n", + " return tf.reshape(output, output_shape)\n", + "\n", + "class Attention:\n", + " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n", + " hidden_size, encoder_layers, num_labels, arc_space, type_space):\n", + " \n", + " def cells(size, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size,\n", + " initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n", + " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n", + " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n", + " self.num_labels = num_labels\n", + " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(encoder_layers)])\n", + "\n", + " \n", + " \n", + " def encode(self, input_word, input_char):\n", + " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n", + " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n", + " b = tf.shape(char)[0]\n", + " wl = tf.shape(char)[1]\n", + " cl = tf.shape(char)[2]\n", + " d = char.shape[3]\n", + " char = tf.reshape(char, [b * wl, cl, d])\n", + " char = tf.reduce_max(self.conv1d(char), axis = 1)\n", + " char = tf.nn.tanh(char)\n", + " d = char.shape[-1]\n", + " char = tf.reshape(char, [b, wl, d])\n", + " \n", + " src_encoding = tf.concat([word, char], axis=2)\n", + " output, hn = tf.nn.dynamic_rnn(self.encoder, src_encoding, dtype = tf.float32,\n", + " scope = 'encoder')\n", + " arc_h = tf.nn.elu(self.arc_h(output))\n", + " arc_c = tf.nn.elu(self.arc_c(output))\n", + " \n", + " type_h = tf.nn.elu(self.type_h(output))\n", + " type_c = tf.nn.elu(self.type_c(output))\n", + " \n", + " return (arc_h, arc_c), (type_h, type_c), hn\n", + " \n", + " def forward(self, input_word, input_char, mask):\n", + " arcs, types, _ = self.encode(input_word, input_char)\n", + " \n", + " out_arc = tf.squeeze(self.attention.forward(arcs[0], arcs[1], mask_d=mask, mask_e=mask), axis = 1)\n", + " return out_arc, types, mask\n", + " \n", + " def loss(self, input_word, input_char, mask, heads, types):\n", + " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n", + " type_h, type_c = out_type\n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " minus_inf = -1e8\n", + " minus_mask = (1 - mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_type = tf.nn.log_softmax(out_type, dim=2)\n", + " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n", + " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n", + " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])\n", + " \n", + " t = tf.transpose(types)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_type = tf.gather_nd(loss_type, concatenated)\n", + " loss_type = tf.transpose(loss_type, [1, 0])\n", + " return tf.reduce_sum(-loss_arc) / num, tf.reduce_sum(-loss_type) / num\n", + " \n", + " def decode(self, input_word, input_char, mask, leading_symbolic=0):\n", + " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n", + " heads = tf.argmax(out_arc, axis = 1)\n", + " type_h, type_c = out_type\n", + " batch = tf.shape(type_h)[0]\n", + " max_len = tf.shape(type_h)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.cast(tf.transpose(heads), tf.int32)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " out_type = out_type[:, :, leading_symbolic:]\n", + " types = tf.argmax(out_type, axis = 2)\n", + " return heads, types\n", + " \n", + "class Model:\n", + " def __init__(\n", + " self, \n", + " dim_word,\n", + " dim_char,\n", + " dropout,\n", + " learning_rate,\n", + " hidden_size_char,\n", + " hidden_size_word,\n", + " num_layers,\n", + " cov = 0.0):\n", + " \n", + " def cells(size, reuse = False):\n", + " return tf.contrib.rnn.DropoutWrapper(\n", + " tf.nn.rnn_cell.LSTMCell(\n", + " size,\n", + " initializer = tf.orthogonal_initializer(),\n", + " reuse = reuse,\n", + " ),\n", + " output_keep_prob = dropout,\n", + " )\n", + " \n", + " self.words = tf.placeholder(tf.int32, (None, None))\n", + " self.chars = tf.placeholder(tf.int32, (None, None, None))\n", + " self.heads = tf.placeholder(tf.int32, (None, None))\n", + " self.types = tf.placeholder(tf.int32, (None, None))\n", + " self.mask = tf.cast(tf.math.not_equal(self.words, 0), tf.float32)\n", + " self.maxlen = tf.shape(self.words)[1]\n", + " self.lengths = tf.count_nonzero(self.words, 1)\n", + " mask = self.mask\n", + " heads = self.heads\n", + " types = self.types\n", + " \n", + " self.arc_h = tf.layers.Dense(hidden_size_word)\n", + " self.arc_c = tf.layers.Dense(hidden_size_word)\n", + " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n", + "\n", + " self.type_h = tf.layers.Dense(hidden_size_word)\n", + " self.type_c = tf.layers.Dense(hidden_size_word)\n", + " self.bilinear = BiLinear(hidden_size_word, hidden_size_word, len(tag2idx))\n", + " \n", + " self.word_embeddings = tf.Variable(\n", + " tf.truncated_normal(\n", + " [len(word2idx), dim_word], stddev = 1.0 / np.sqrt(dim_word)\n", + " )\n", + " )\n", + " self.char_embeddings = tf.Variable(\n", + " tf.truncated_normal(\n", + " [len(char2idx), dim_char], stddev = 1.0 / np.sqrt(dim_char)\n", + " )\n", + " )\n", + "\n", + " word_embedded = tf.nn.embedding_lookup(\n", + " self.word_embeddings, self.words\n", + " )\n", + " char_embedded = tf.nn.embedding_lookup(\n", + " self.char_embeddings, self.chars\n", + " )\n", + " s = tf.shape(char_embedded)\n", + " char_embedded = tf.reshape(\n", + " char_embedded, shape = [s[0] * s[1], s[-2], dim_char]\n", + " )\n", + "\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (\n", + " state_fw,\n", + " state_bw,\n", + " ) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(hidden_size_char),\n", + " cell_bw = cells(hidden_size_char),\n", + " inputs = char_embedded,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_char_%d' % (n),\n", + " )\n", + " char_embedded = tf.concat((out_fw, out_bw), 2)\n", + " output = tf.reshape(\n", + " char_embedded[:, -1], shape = [s[0], s[1], 2 * hidden_size_char]\n", + " )\n", + " word_embedded = tf.concat([word_embedded, output], axis = -1)\n", + "\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (\n", + " state_fw,\n", + " state_bw,\n", + " ) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(hidden_size_word),\n", + " cell_bw = cells(hidden_size_word),\n", + " inputs = word_embedded,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_word_%d' % (n),\n", + " )\n", + " word_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " \n", + " arc_h = tf.nn.elu(self.arc_h(word_embedded))\n", + " arc_c = tf.nn.elu(self.arc_c(word_embedded))\n", + " \n", + " type_h = tf.nn.elu(self.type_h(word_embedded))\n", + " type_c = tf.nn.elu(self.type_c(word_embedded))\n", + " \n", + " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=self.mask, \n", + " mask_e=self.mask), axis = 1)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " batch_index = tf.range(0, batch)\n", + " \n", + " decode_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " decode_arc = tf.where(minus_mask, tf.fill(tf.shape(decode_arc), -np.inf), decode_arc)\n", + " self.heads_seq = tf.argmax(decode_arc, axis = 1)\n", + " \n", + " t = tf.cast(tf.transpose(self.heads_seq), tf.int32)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " self.tags_seq = tf.argmax(out_type, axis = 2)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " minus_inf = -1e8\n", + " minus_mask = (1 - mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_type = tf.nn.log_softmax(out_type, dim=2)\n", + " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n", + " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n", + " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])\n", + " \n", + " t = tf.transpose(types)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_type = tf.gather_nd(loss_type, concatenated)\n", + " loss_type = tf.transpose(loss_type, [1, 0])\n", + " self.cost = (tf.reduce_sum(-loss_arc) / num) + (tf.reduce_sum(-loss_type) / num)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " \n", + " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", + " \n", + " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", + " mask_label = tf.boolean_mask(self.types, mask)\n", + " correct_pred = tf.equal(tf.cast(self.prediction, tf.int32), mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " self.prediction = tf.cast(tf.boolean_mask(self.heads_seq, mask), tf.int32)\n", + " mask_label = tf.boolean_mask(self.heads, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :183: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :238: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :277: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :300: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "dim is deprecated, use axis instead\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "dim_word = 128\n", + "dim_char = 256\n", + "dropout = 1.0\n", + "learning_rate = 1e-3\n", + "hidden_size_char = 128\n", + "hidden_size_word = 128\n", + "num_layers = 2\n", + "\n", + "model = Model(dim_word,dim_char,dropout,learning_rate,hidden_size_char,hidden_size_word,num_layers)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_char = train_char[:5]\n", + "batch_char = generate_char_seq(batch_char)\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.0, 0.05172414, 7.4798884]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n", + " feed_dict = {model.words: batch_x,\n", + " model.chars: batch_char,\n", + " model.types: batch_y,\n", + " model.heads: batch_depends})" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([19, 19, 19, 23, 23, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23,\n", + " 23, 23, 23, 17, 17, 17, 17, 35, 35, 35, 43, 43, 43, 43, 43, 43, 43,\n", + " 35, 35]),\n", + " array([2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", + " 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0]),\n", + " array([ 0, 1, 1, 1, 6, 7, 1, 7, 8, 8, 8, 8, 8, 15, 8, 18, 18,\n", + " 7, 21, 21, 18, 23, 21, 21, 28, 28, 28, 21, 1, 0, 0, 0, 0, 0,\n", + " 0, 0], dtype=int32))" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tags_seq, heads = sess.run(\n", + " [model.tags_seq, model.heads_seq],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " model.chars: batch_char\n", + " },\n", + ")\n", + "tags_seq[0], heads[0], batch_depends[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.53it/s, accuracy=0.675, accuracy_depends=0.664, cost=2] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.80it/s, accuracy=0.688, accuracy_depends=0.66, cost=1.91] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:09, 5.39it/s, accuracy=0.641, accuracy_depends=0.606, cost=2.29]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 0, training loss: 4.134478, training acc: 0.409135, training depends: 0.386963, valid loss: 2.286293, valid acc: 0.647242, valid depends: 0.598646\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.57it/s, accuracy=0.782, accuracy_depends=0.767, cost=1.27] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.59it/s, accuracy=0.789, accuracy_depends=0.761, cost=1.13]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:11, 5.20it/s, accuracy=0.752, accuracy_depends=0.759, cost=1.31]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1, training loss: 1.745138, training acc: 0.708187, training depends: 0.671352, valid loss: 1.594311, valid acc: 0.737908, valid depends: 0.680541\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:07<00:00, 5.59it/s, accuracy=0.796, accuracy_depends=0.802, cost=1.02] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:08<00:00, 14.77it/s, accuracy=0.818, accuracy_depends=0.806, cost=0.888]\n", + "train minibatch loop: 0%| | 0/375 [00:00:225: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :248: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "dim is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Deprecated in favor of operator or tf.math.divide.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "learning_rate = 2e-5\n", + "hidden_size_word = 128\n", + "\n", + "model = Model(learning_rate, hidden_size_word)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.0070422534, 0.028169014, 12.410244]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n", + " feed_dict = {model.words: batch_x,\n", + " model.types: batch_y,\n", + " model.heads: batch_depends})" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 8, 8, 48, 34, 36, 36, 27, 30, 19, 8, 34, 29, 29, 41, 28, 41, 19,\n", + " 20, 20, 41, 47, 20, 23, 47, 28, 19, 27, 41, 18, 48, 36, 41, 27, 34,\n", + " 36, 4, 28, 8, 8, 8, 4, 8, 8, 4]),\n", + " array([20, 10, 16, 2, 9, 10, 0, 21, 1, 0, 2, 2, 2, 10, 10, 10, 17,\n", + " 36, 36, 10, 2, 36, 10, 2, 10, 0, 0, 10, 36, 16, 10, 10, 0, 2,\n", + " 10, 1, 10, 0, 0, 0, 0, 0, 0, 0]),\n", + " array([ 0, 1, 2, 2, 0, 2, 7, 8, 2, 8, 0, 0, 9, 9, 9, 9, 0,\n", + " 9, 16, 9, 19, 19, 8, 22, 22, 19, 24, 22, 0, 0, 22, 29, 29, 29,\n", + " 22, 2, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32))" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tags_seq, heads = sess.run(\n", + " [model.tags_seq, model.heads_seq],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " },\n", + ")\n", + "tags_seq[0], heads[0], batch_depends[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:10<00:00, 5.31it/s, accuracy=0.754, accuracy_depends=0.482, cost=2.55] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.51it/s, accuracy=0.808, accuracy_depends=0.549, cost=2] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.64it/s, accuracy=0.746, accuracy_depends=0.383, cost=2.83]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 0, training loss: 4.682894, training acc: 0.433306, training depends: 0.308738, valid loss: 2.175135, valid acc: 0.757226, valid depends: 0.515791\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.884, accuracy_depends=0.641, cost=1.37] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.69it/s, accuracy=0.886, accuracy_depends=0.724, cost=0.95] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.72it/s, accuracy=0.848, accuracy_depends=0.53, cost=1.85]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1, training loss: 1.797183, training acc: 0.815364, training depends: 0.561427, valid loss: 1.349600, valid acc: 0.857193, valid depends: 0.636366\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.889, accuracy_depends=0.695, cost=1.04] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.36it/s, accuracy=0.919, accuracy_depends=0.76, cost=0.708] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.68it/s, accuracy=0.877, accuracy_depends=0.61, cost=1.42]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 2, training loss: 1.193647, training acc: 0.869602, training depends: 0.653151, valid loss: 1.071987, valid acc: 0.879075, valid depends: 0.677740\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.912, accuracy_depends=0.691, cost=0.926]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.55it/s, accuracy=0.919, accuracy_depends=0.779, cost=0.63] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.68it/s, accuracy=0.893, accuracy_depends=0.627, cost=1.16]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 3, training loss: 0.931854, training acc: 0.892288, training depends: 0.696346, valid loss: 1.005326, valid acc: 0.883707, valid depends: 0.692016\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.914, accuracy_depends=0.739, cost=0.762]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.45it/s, accuracy=0.912, accuracy_depends=0.799, cost=0.51] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.66it/s, accuracy=0.889, accuracy_depends=0.654, cost=1.01]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 4, training loss: 0.777697, training acc: 0.901257, training depends: 0.721131, valid loss: 0.964560, valid acc: 0.877505, valid depends: 0.701398\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.913, accuracy_depends=0.755, cost=0.638]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.68it/s, accuracy=0.912, accuracy_depends=0.812, cost=0.492]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:04, 5.79it/s, accuracy=0.893, accuracy_depends=0.659, cost=0.932]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 5, training loss: 0.668966, training acc: 0.901568, training depends: 0.741328, valid loss: 0.928792, valid acc: 0.891125, valid depends: 0.710297\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.913, accuracy_depends=0.751, cost=0.616]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.45it/s, accuracy=0.919, accuracy_depends=0.825, cost=0.394]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.896, accuracy_depends=0.707, cost=0.809]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 6, training loss: 0.594000, training acc: 0.913134, training depends: 0.754423, valid loss: 0.943845, valid acc: 0.888382, valid depends: 0.713479\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.927, accuracy_depends=0.776, cost=0.537] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.37it/s, accuracy=0.935, accuracy_depends=0.808, cost=0.534]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.61it/s, accuracy=0.909, accuracy_depends=0.709, cost=0.77]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 7, training loss: 0.538314, training acc: 0.920553, training depends: 0.764744, valid loss: 0.930650, valid acc: 0.903622, valid depends: 0.718959\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.935, accuracy_depends=0.781, cost=0.505]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.38it/s, accuracy=0.938, accuracy_depends=0.821, cost=0.457]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:07, 5.53it/s, accuracy=0.915, accuracy_depends=0.711, cost=0.767]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 8, training loss: 0.486278, training acc: 0.927081, training depends: 0.774812, valid loss: 0.932128, valid acc: 0.904604, valid depends: 0.722158\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.925, accuracy_depends=0.787, cost=0.485] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.50it/s, accuracy=0.958, accuracy_depends=0.825, cost=0.524]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.61it/s, accuracy=0.924, accuracy_depends=0.735, cost=0.633]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 9, training loss: 0.447538, training acc: 0.931575, training depends: 0.781835, valid loss: 0.943356, valid acc: 0.905484, valid depends: 0.722892\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.942, accuracy_depends=0.806, cost=0.424] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.55it/s, accuracy=0.935, accuracy_depends=0.815, cost=0.496]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.896, accuracy_depends=0.748, cost=0.611]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 10, training loss: 0.413205, training acc: 0.932623, training depends: 0.789132, valid loss: 0.954858, valid acc: 0.903540, valid depends: 0.724419\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.943, accuracy_depends=0.788, cost=0.442] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.28it/s, accuracy=0.945, accuracy_depends=0.795, cost=0.602]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.69it/s, accuracy=0.92, accuracy_depends=0.761, cost=0.558]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 11, training loss: 0.389162, training acc: 0.934991, training depends: 0.793624, valid loss: 0.962155, valid acc: 0.910515, valid depends: 0.726305\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.65it/s, accuracy=0.943, accuracy_depends=0.806, cost=0.433] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.37it/s, accuracy=0.942, accuracy_depends=0.828, cost=0.454]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.67it/s, accuracy=0.919, accuracy_depends=0.759, cost=0.538]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 12, training loss: 0.368160, training acc: 0.940245, training depends: 0.797881, valid loss: 0.978189, valid acc: 0.906123, valid depends: 0.726453\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.66it/s, accuracy=0.942, accuracy_depends=0.807, cost=0.404] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.48it/s, accuracy=0.951, accuracy_depends=0.844, cost=0.43] \n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.63it/s, accuracy=0.934, accuracy_depends=0.759, cost=0.563]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 13, training loss: 0.356211, training acc: 0.941396, training depends: 0.800658, valid loss: 0.964498, valid acc: 0.910670, valid depends: 0.727217\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:06<00:00, 5.67it/s, accuracy=0.943, accuracy_depends=0.814, cost=0.378] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:05<00:00, 20.28it/s, accuracy=0.945, accuracy_depends=0.805, cost=0.468]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 14, training loss: 0.346428, training acc: 0.943538, training depends: 0.802292, valid loss: 0.971327, valid acc: 0.908659, valid depends: 0.727845\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "batch_size = 32\n", + "epoch = 15\n", + "\n", + "for e in range(epoch):\n", + " train_acc, train_loss = [], []\n", + " test_acc, test_loss = [], []\n", + " train_acc_depends, test_acc_depends = [], []\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_X))\n", + " batch_x = train_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_y = train_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = train_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " acc_depends, acc, cost, _ = sess.run(\n", + " [model.accuracy_depends, model.accuracy, model.cost, model.optimizer],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " model.types: batch_y,\n", + " model.heads: batch_depends\n", + " },\n", + " )\n", + " train_loss.append(cost)\n", + " train_acc.append(acc)\n", + " train_acc_depends.append(acc_depends)\n", + " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_y = test_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = test_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " acc_depends, acc, cost = sess.run(\n", + " [model.accuracy_depends, model.accuracy, model.cost],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " model.types: batch_y,\n", + " model.heads: batch_depends\n", + " },\n", + " )\n", + " test_loss.append(cost)\n", + " test_acc.append(acc)\n", + " test_acc_depends.append(acc_depends)\n", + " pbar.set_postfix(cost = cost, accuracy = acc, accuracy_depends = acc_depends)\n", + " \n", + " \n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, training depends: %f, valid loss: %f, valid acc: %f, valid depends: %f\\n'\n", + " % (e, np.mean(train_loss), \n", + " np.mean(train_acc), \n", + " np.mean(train_acc_depends), \n", + " np.mean(test_loss), \n", + " np.mean(test_acc), \n", + " np.mean(test_acc_depends)\n", + " ))\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 0, 40, 6, 22, 26, 23, 18, 16, 1, 1, 5, 3, 13, 10, 11, 6, 12,\n", + " 13, 10, 16, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0]),\n", + " array([ 3, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 10, 8, 14, 13, 8,\n", + " 15, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", + " -1, -1, -1, -1, -1, -1, -1, -1]),\n", + " array([-1, 2, 8, 5, 5, 2, 8, 8, -1, -1, 0, 11, 11, 8, 14, 14, 8,\n", + " 16, 14, 14, 8, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n", + " -1, -1, -1, -1, -1, -1, -1, -1], dtype=int32))" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tags_seq, heads = sess.run(\n", + " [model.tags_seq, model.heads_seq],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " },\n", + ")\n", + "tags_seq[0], heads[0] - 1, batch_depends[0] - 1" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "def evaluate(heads_pred, types_pred, heads, types, lengths,\n", + " symbolic_root=False, symbolic_end=False):\n", + " batch_size, _ = heads_pred.shape\n", + " ucorr = 0.\n", + " lcorr = 0.\n", + " total = 0.\n", + " ucomplete_match = 0.\n", + " lcomplete_match = 0.\n", + "\n", + " corr_root = 0.\n", + " total_root = 0.\n", + " start = 1 if symbolic_root else 0\n", + " end = 1 if symbolic_end else 0\n", + " for i in range(batch_size):\n", + " ucm = 1.\n", + " lcm = 1.\n", + " for j in range(start, lengths[i] - end):\n", + "\n", + " total += 1\n", + " if heads[i, j] == heads_pred[i, j]:\n", + " ucorr += 1\n", + " if types[i, j] == types_pred[i, j]:\n", + " lcorr += 1\n", + " else:\n", + " lcm = 0\n", + " else:\n", + " ucm = 0\n", + " lcm = 0\n", + "\n", + " if heads[i, j] == 0:\n", + " total_root += 1\n", + " corr_root += 1 if heads_pred[i, j] == 0 else 0\n", + "\n", + " ucomplete_match += ucm\n", + " lcomplete_match += lcm\n", + " \n", + " return ucorr / total, lcorr / total, corr_root / total_root" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "arcs, types, roots = [], [], []\n", + "\n", + "for i in range(0, len(test_X), batch_size):\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_y = test_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = test_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " tags_seq, heads = sess.run(\n", + " [model.tags_seq, model.heads_seq],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " },\n", + " )\n", + " \n", + " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads - 1, tags_seq, batch_depends - 1, batch_y, \n", + " np.count_nonzero(batch_x, axis = 1))\n", + " arcs.append(arc_accuracy)\n", + " types.append(type_accuracy)\n", + " roots.append(root_accuracy)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arc accuracy: 0.728543873570515\n", + "types accuracy: 0.6711201611430444\n", + "root accuracy: 0.7393229166666667\n" + ] + } + ], + "source": [ + "print('arc accuracy:', np.mean(arcs))\n", + "print('types accuracy:', np.mean(types))\n", + "print('root accuracy:', np.mean(roots))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/dependency-parser/7.stackpointer.ipynb b/dependency-parser/7.stackpointer.ipynb new file mode 100644 index 0000000..4238bf9 --- /dev/null +++ b/dependency-parser/7.stackpointer.ipynb @@ -0,0 +1,1781 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import malaya\n", + "import re\n", + "from malaya.texts._text_functions import split_into_sentences\n", + "from malaya.texts import _regex\n", + "import numpy as np\n", + "import itertools\n", + "\n", + "tokenizer = malaya.preprocessing._tokenizer\n", + "splitter = split_into_sentences" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "def is_number_regex(s):\n", + " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n", + " return s.isdigit()\n", + " return True\n", + "\n", + "def preprocessing(w):\n", + " if is_number_regex(w):\n", + " return ''\n", + " elif re.match(_regex._money, w):\n", + " return ''\n", + " elif re.match(_regex._date, w):\n", + " return ''\n", + " elif re.match(_regex._expressions['email'], w):\n", + " return ''\n", + " elif re.match(_regex._expressions['url'], w):\n", + " return ''\n", + " else:\n", + " w = ''.join(''.join(s)[:2] for _, s in itertools.groupby(w))\n", + " return w" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "({'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2,\n", + " '': 3,\n", + " '': 4,\n", + " '': 5,\n", + " '': 6,\n", + " '': 7},\n", + " {'PAD': 0,\n", + " 'UNK': 1,\n", + " '_ROOT': 2,\n", + " '': 3,\n", + " '': 4,\n", + " '': 5,\n", + " '': 6,\n", + " '': 7})" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "tag2idx = {'PAD': 0, '_': 1}\n", + "char2idx = {'PAD': 0,'UNK':1, '_ROOT': 2}\n", + "word_idx = 3\n", + "tag_idx = 2\n", + "char_idx = 3\n", + "\n", + "special_tokens = ['', '', '', '', '']\n", + "\n", + "for t in special_tokens:\n", + " word2idx[t] = word_idx\n", + " word_idx += 1\n", + " char2idx[t] = char_idx\n", + " char_idx += 1\n", + " \n", + "word2idx, char2idx" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "PAD = \"_PAD\"\n", + "PAD_POS = \"_PAD_POS\"\n", + "PAD_TYPE = \"_\"\n", + "PAD_CHAR = \"_PAD_CHAR\"\n", + "ROOT = \"_ROOT\"\n", + "ROOT_POS = \"_ROOT_POS\"\n", + "ROOT_TYPE = \"_\"\n", + "ROOT_CHAR = \"_ROOT_CHAR\"\n", + "END = \"_END\"\n", + "END_POS = \"_END_POS\"\n", + "END_TYPE = \"_\"\n", + "END_CHAR = \"_END_CHAR\"\n", + "\n", + "def process_corpus(corpus, until = None):\n", + " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", + " sentences, words, depends, labels, pos, chars = [], [], [], [], [], []\n", + " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", + " first_time = True\n", + " for sentence in corpus:\n", + " try:\n", + " if len(sentence):\n", + " if sentence[0] == '#':\n", + " continue\n", + " if first_time:\n", + " print(sentence)\n", + " first_time = False\n", + " sentence = sentence.split('\\t')\n", + " for c in sentence[1]:\n", + " if c not in char2idx:\n", + " char2idx[c] = char_idx\n", + " char_idx += 1\n", + " if sentence[7] not in tag2idx:\n", + " tag2idx[sentence[7]] = tag_idx\n", + " tag_idx += 1\n", + " sentence[1] = preprocessing(sentence[1])\n", + " if sentence[1] not in word2idx:\n", + " word2idx[sentence[1]] = word_idx\n", + " word_idx += 1\n", + " temp_word.append(word2idx[sentence[1]])\n", + " temp_depend.append(int(sentence[6]))\n", + " temp_label.append(tag2idx[sentence[7]])\n", + " temp_sentence.append(sentence[1])\n", + " temp_pos.append(sentence[3])\n", + " else:\n", + " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " continue\n", + " words.append([word2idx['_ROOT']] + temp_word)\n", + " depends.append([0] + temp_depend)\n", + " labels.append([tag2idx['_']] + temp_label)\n", + " sentences.append([ROOT] + temp_sentence)\n", + " pos.append([ROOT_POS] + temp_pos)\n", + " char_ = [[char2idx['_ROOT']]]\n", + " for w in temp_sentence:\n", + " if w in char2idx:\n", + " char_.append([char2idx[w]])\n", + " else:\n", + " char_.append([char2idx[c] for c in w])\n", + " chars.append(char_)\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " except Exception as e:\n", + " print(e, sentence)\n", + " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], chars[:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def _obtain_child_index_for_left2right(heads):\n", + " child_ids = [[] for _ in range(len(heads))]\n", + " # skip the symbolic root.\n", + " for child in range(1, len(heads)):\n", + " head = heads[child]\n", + " child_ids[head].append(child)\n", + " return child_ids\n", + "\n", + "\n", + "def _obtain_child_index_for_inside_out(heads):\n", + " child_ids = [[] for _ in range(len(heads))]\n", + " for head in range(len(heads)):\n", + " # first find left children inside-out\n", + " for child in reversed(range(1, head)):\n", + " if heads[child] == head:\n", + " child_ids[head].append(child)\n", + " # second find right children inside-out\n", + " for child in range(head + 1, len(heads)):\n", + " if heads[child] == head:\n", + " child_ids[head].append(child)\n", + " return child_ids\n", + "\n", + "\n", + "def _obtain_child_index_for_depth(heads, reverse):\n", + " def calc_depth(head):\n", + " children = child_ids[head]\n", + " max_depth = 0\n", + " for child in children:\n", + " depth = calc_depth(child)\n", + " child_with_depth[head].append((child, depth))\n", + " max_depth = max(max_depth, depth + 1)\n", + " child_with_depth[head] = sorted(child_with_depth[head], key=lambda x: x[1], reverse=reverse)\n", + " return max_depth\n", + "\n", + " child_ids = _obtain_child_index_for_left2right(heads)\n", + " child_with_depth = [[] for _ in range(len(heads))]\n", + " calc_depth(0)\n", + " return [[child for child, depth in child_with_depth[head]] for head in range(len(heads))]\n", + "\n", + "\n", + "def _generate_stack_inputs(heads, types, prior_order):\n", + " if prior_order == 'deep_first':\n", + " child_ids = _obtain_child_index_for_depth(heads, True)\n", + " elif prior_order == 'shallow_first':\n", + " child_ids = _obtain_child_index_for_depth(heads, False)\n", + " elif prior_order == 'left2right':\n", + " child_ids = _obtain_child_index_for_left2right(heads)\n", + " elif prior_order == 'inside_out':\n", + " child_ids = _obtain_child_index_for_inside_out(heads)\n", + " else:\n", + " raise ValueError('Unknown prior order: %s' % prior_order)\n", + "\n", + " stacked_heads = []\n", + " children = []\n", + " siblings = []\n", + " stacked_types = []\n", + " skip_connect = []\n", + " prev = [0 for _ in range(len(heads))]\n", + " sibs = [0 for _ in range(len(heads))]\n", + " stack = [0]\n", + " position = 1\n", + " while len(stack) > 0:\n", + " head = stack[-1]\n", + " stacked_heads.append(head)\n", + " siblings.append(sibs[head])\n", + " child_id = child_ids[head]\n", + " skip_connect.append(prev[head])\n", + " prev[head] = position\n", + " if len(child_id) == 0:\n", + " children.append(head)\n", + " sibs[head] = 0\n", + " stacked_types.append(tag2idx['PAD'])\n", + " stack.pop()\n", + " else:\n", + " child = child_id.pop(0)\n", + " children.append(child)\n", + " sibs[head] = child\n", + " stack.append(child)\n", + " stacked_types.append(types[child])\n", + " position += 1\n", + "\n", + " return stacked_heads, children, siblings, stacked_types, skip_connect" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n", + "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-dev.conllu') as fopen:\n", + " dev = fopen.read().split('\\n')\n", + "\n", + "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev = process_corpus(dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "stacked_heads_test, children_test, siblings_test, stacked_types_test = [], [], [], []\n", + "for i in range(len(sentences_dev)):\n", + " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_dev[i], \n", + " labels_dev[i], 'deep_first')\n", + " stacked_heads_test.append(stacked_heads)\n", + " children_test.append(children)\n", + " siblings_test.append(siblings)\n", + " stacked_types_test.append(stacked_types)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n", + "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-test.conllu') as fopen:\n", + " test = fopen.read().split('\\n')\n", + "\n", + "sentences_test, words_test, depends_test, labels_test, _, seq_test = process_corpus(test)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(len(sentences_test)):\n", + " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_test[i], \n", + " labels_test[i], 'deep_first')\n", + " stacked_heads_test.append(stacked_heads)\n", + " children_test.append(children)\n", + " siblings_test.append(siblings)\n", + " stacked_types_test.append(stacked_types)\n", + " \n", + "sentences_test.extend(sentences_dev)\n", + "words_test.extend(words_dev)\n", + "depends_test.extend(depends_dev)\n", + "labels_test.extend(labels_dev)\n", + "seq_test.extend(seq_dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n", + "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n", + "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n", + "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n", + "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n", + "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n", + "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-train.conllu') as fopen:\n", + " train = fopen.read().split('\\n')\n", + "\n", + "sentences_train, words_train, depends_train, labels_train, _, _ = process_corpus(train)\n", + "\n", + "stacked_heads_train, children_train, siblings_train, stacked_types_train = [], [], [], []\n", + "for i in range(len(sentences_train)):\n", + " stacked_heads, children, siblings, stacked_types, _ = _generate_stack_inputs(depends_train[i], \n", + " labels_train[i], 'deep_first')\n", + " stacked_heads_train.append(stacked_heads)\n", + " children_train.append(children)\n", + " siblings_train.append(siblings)\n", + " stacked_types_train.append(stacked_types)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(12000, 3824)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sentences_train), len(sentences_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "21974" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "idx2word = {v:k for k, v in word2idx.items()}\n", + "idx2tag = {v:k for k, v in tag2idx.items()}\n", + "len(idx2word)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "from enum import Enum\n", + "\n", + "class PriorOrder(Enum):\n", + " DEPTH = 0\n", + " INSIDE_OUT = 1\n", + " LEFT2RIGTH = 2\n", + "\n", + "class BiAAttention:\n", + " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n", + " self.input_size_encoder = input_size_encoder\n", + " self.input_size_decoder = input_size_decoder\n", + " self.num_labels = num_labels\n", + " \n", + " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n", + " batch = tf.shape(input_d)[0]\n", + " length_decoder = tf.shape(input_d)[1]\n", + " length_encoder = tf.shape(input_e)[1]\n", + " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n", + " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n", + " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n", + " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n", + " \n", + " output = output + out_d + out_e\n", + " \n", + " if mask_d is not None:\n", + " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n", + " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n", + " output = output * d * e\n", + " \n", + " return output\n", + " \n", + "class BiLinear:\n", + " def __init__(self, left_features, right_features, out_features):\n", + " self.left_features = left_features\n", + " self.right_features = right_features\n", + " self.out_features = out_features\n", + " \n", + " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_left, input_right):\n", + " left_size = tf.shape(input_left)\n", + " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n", + " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n", + " input_left = tf.reshape(input_left, (batch, self.left_features))\n", + " input_right = tf.reshape(input_right, (batch, self.right_features))\n", + " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n", + " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n", + " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n", + " + tf.matmul(input_right, tf.transpose(self.W_r))\n", + " \n", + " return tf.reshape(output, output_shape)\n", + "\n", + "class StackPointer:\n", + " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n", + " input_size_decoder, hidden_size, layers,\n", + " num_labels, arc_space, type_space):\n", + " \n", + " def cells(size, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size,\n", + " initializer=tf.orthogonal_initializer(),reuse=reuse,\n", + " state_is_tuple=False)\n", + " \n", + " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n", + " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n", + " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n", + " self.num_labels = num_labels\n", + " self.prior_order = PriorOrder.DEPTH\n", + " self.char_dim = char_dim\n", + " self.layers = layers\n", + " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n", + " state_is_tuple=False)\n", + " self.encoder_char = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n", + " state_is_tuple=False)\n", + " self.decoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(layers)],\n", + " state_is_tuple=False)\n", + " self.hidden_size = hidden_size\n", + " self.arc_space = arc_space\n", + " \n", + " \n", + " self.src_dense = tf.layers.Dense(hidden_size)\n", + " self.hx_dense = tf.layers.Dense(hidden_size)\n", + "\n", + " self.arc_h = tf.layers.Dense(arc_space)\n", + " self.arc_c = tf.layers.Dense(arc_space)\n", + " self.attention = BiAAttention(arc_space, arc_space, 1)\n", + "\n", + " self.type_h = tf.layers.Dense(type_space)\n", + " self.type_c = tf.layers.Dense(type_space)\n", + " self.bilinear = BiLinear(type_space, type_space, self.num_labels)\n", + " \n", + " def encode(self, input_word, input_char):\n", + " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n", + " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n", + " s = tf.shape(char)\n", + " char = tf.reshape(\n", + " char, shape = [s[0] * s[1], s[-2], self.char_dim]\n", + " )\n", + " output, _ = tf.nn.dynamic_rnn(self.encoder_char, char, dtype = tf.float32,\n", + " scope = 'encoder-char')\n", + " output = tf.reshape(\n", + " output[:, -1], shape = [s[0], s[1], self.hidden_size]\n", + " )\n", + " word_embedded = tf.concat([word, output], axis = -1)\n", + " output, hn = tf.nn.dynamic_rnn(self.encoder, word_embedded, dtype = tf.float32,\n", + " scope = 'encoder')\n", + " return output, hn\n", + " \n", + " def decode(self, output_encoder, heads, heads_stack, siblings, hn):\n", + " batch = tf.shape(output_encoder)[0]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.transpose(heads_stack)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " src_encoding = tf.gather_nd(output_encoder, concatenated)\n", + " \n", + " mask_sibs = tf.expand_dims(tf.cast(tf.not_equal(siblings, 0), tf.float32), axis = 2)\n", + " t = tf.transpose(siblings)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " output_enc_sibling = tf.gather_nd(output_encoder, concatenated) * mask_sibs\n", + " src_encoding = src_encoding + output_enc_sibling\n", + " \n", + " t = tf.transpose(heads_stack)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)],axis = 0))\n", + " g = tf.transpose(tf.gather_nd(heads, concatenated))\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(g))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(g, axis = 0)],axis = 0))\n", + " output_enc_gpar = tf.gather_nd(output_encoder, concatenated)\n", + " src_encoding = src_encoding + output_enc_gpar\n", + " \n", + " src_encoding = tf.nn.elu(self.src_dense(src_encoding))\n", + " output, hn = tf.nn.dynamic_rnn(self.decoder, src_encoding, dtype = tf.float32,\n", + " initial_state = hn,\n", + " scope = 'decoder')\n", + " return output, hn\n", + " \n", + " def loss(self, input_word, input_char, \n", + " heads, stacked_heads, children, siblings, stacked_types,\n", + " mask_e, mask_d,\n", + " label_smooth = 1.0):\n", + " \n", + " output_enc, hn_enc = self.encode(input_word, input_char)\n", + " arc_c = tf.nn.elu(self.arc_c(output_enc))\n", + " type_c = tf.nn.elu(self.type_c(output_enc))\n", + " \n", + " output_dec, _ = self.decode(output_enc, heads, stacked_heads, siblings, hn_enc)\n", + " arc_h = tf.nn.elu(self.arc_h(output_dec))\n", + " type_h = tf.nn.elu(self.type_h(output_dec))\n", + " \n", + " max_len_d = tf.shape(arc_h)[1]\n", + " \n", + " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_c, mask_d=mask_d, mask_e=mask_e), axis = 1)\n", + " batch = tf.shape(arc_c)[0]\n", + " max_len_e = tf.shape(arc_c)[1]\n", + " batch_index = tf.range(0, batch)\n", + " \n", + " t = tf.transpose(children)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_c = tf.gather_nd(type_c, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " print(out_arc.shape,out_type.shape)\n", + " \n", + " minus_inf = -1e8\n", + " minus_mask_d = (1 - mask_d) * minus_inf\n", + " minus_mask_e = (1 - mask_e) * minus_inf\n", + " \n", + " out_arc = out_arc + tf.expand_dims(minus_mask_d, 2) + tf.expand_dims(minus_mask_e, 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, axis = 2)\n", + " loss_type = tf.nn.log_softmax(out_type, axis = 2)\n", + " coverage = tf.cumsum(tf.exp(loss_arc), axis = 1)\n", + " \n", + " mask_leaf = tf.cast(tf.equal(children, stacked_heads), tf.float32)\n", + " mask_non_leaf = (1.0 - mask_leaf)\n", + " \n", + " mask_d_2 = tf.expand_dims(mask_d, 2)\n", + " mask_e_1 = tf.expand_dims(mask_e, 1)\n", + " \n", + " loss_arc = loss_arc * mask_d_2 * mask_e_1\n", + " coverage = coverage * mask_d_2 * mask_e_1\n", + " loss_type = loss_type * mask_d_2\n", + " mask_leaf = mask_leaf * mask_d\n", + " mask_non_leaf = mask_non_leaf * mask_d\n", + " num_leaf = tf.reduce_sum(mask_leaf)\n", + " num_non_leaf = tf.reduce_sum(mask_non_leaf)\n", + " head_index = tf.tile(tf.expand_dims(tf.range(0, max_len_d), 1), [1, batch])\n", + " \n", + " t = tf.transpose(children)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(head_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " \n", + " t = tf.transpose(stacked_types)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(head_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_type = tf.gather_nd(loss_type, concatenated)\n", + " \n", + " loss_arc_leaf = loss_arc * mask_leaf\n", + " loss_arc_non_leaf = loss_arc * mask_non_leaf\n", + "\n", + " loss_type_leaf = loss_type * mask_leaf\n", + " loss_type_non_leaf = loss_type * mask_non_leaf\n", + " \n", + " loss_cov = tf.clip_by_value(coverage - 2.0, 0.0, 100.0)\n", + " \n", + " return (tf.reduce_sum(-loss_arc_leaf) / num_leaf, \n", + " tf.reduce_sum(-loss_arc_non_leaf) / num_non_leaf,\n", + " tf.reduce_sum(-loss_type_leaf) / num_leaf, \n", + " tf.reduce_sum(-loss_type_non_leaf) / num_non_leaf,\n", + " tf.reduce_sum(loss_cov) / (num_leaf + num_non_leaf), \n", + " num_leaf, \n", + " num_non_leaf)\n", + " \n", + "class Model:\n", + " def __init__(self, learning_rate = 1e-3, cov = 0.0):\n", + " self.stackpointer = StackPointer(word_dim = 128, \n", + " num_words = len(word2idx), \n", + " char_dim = 128, \n", + " num_chars = len(char2idx), \n", + " num_filters = 128, \n", + " kernel_size = 3,\n", + " input_size_decoder = 256, \n", + " hidden_size = 256, \n", + " layers = 1,\n", + " num_labels = len(tag2idx), \n", + " arc_space = 128, \n", + " type_space = 128)\n", + " self.words = tf.placeholder(tf.int32, (None, None))\n", + " self.chars = tf.placeholder(tf.int32, (None, None, None))\n", + " self.heads = tf.placeholder(tf.int32, (None, None))\n", + " self.stacked_heads = tf.placeholder(tf.int32, (None, None))\n", + " self.siblings = tf.placeholder(tf.int32, (None, None))\n", + " self.childrens = tf.placeholder(tf.int32, (None, None))\n", + " self.stacked_types = tf.placeholder(tf.int32, (None, None))\n", + " self.mask_e = tf.placeholder(tf.float32, (None, None))\n", + " self.mask_d = tf.placeholder(tf.float32, (None, None))\n", + " loss_arc_leaf, loss_arc_non_leaf, \\\n", + " loss_type_leaf, loss_type_non_leaf, \\\n", + " loss_cov, num_leaf, num_non_leaf = self.stackpointer.loss(self.words, self.chars, self.heads, \n", + " self.stacked_heads, self.childrens, \n", + " self.siblings, self.stacked_types,\n", + " self.mask_e, self.mask_d)\n", + " loss_arc = loss_arc_leaf + loss_arc_non_leaf\n", + " loss_type = loss_type_leaf + loss_type_non_leaf\n", + " self.cost = loss_arc + loss_type + cov * loss_cov\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " \n", + " self.encode_output, self.encode_hidden = self.stackpointer.encode(self.words, self.chars)\n", + " self.encode_arc_c = tf.nn.elu(self.stackpointer.arc_c(self.encode_output))\n", + " self.type_c = tf.nn.elu(self.stackpointer.type_c(self.encode_output))\n", + " \n", + " self.src_encoding = tf.placeholder(tf.float32, (None, self.stackpointer.hidden_size))\n", + " self.arc_c = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n", + " self.hx = tf.placeholder(tf.float32, (None, \n", + " self.stackpointer.hidden_size * 2 * self.stackpointer.layers)) \n", + " \n", + " src_encoding = tf.nn.elu(self.stackpointer.src_dense(self.src_encoding))\n", + " output_dec, hx = self.stackpointer.decoder(src_encoding, self.hx)\n", + " arc_h = tf.nn.elu(self.stackpointer.arc_h(tf.expand_dims(output_dec, axis = 1)))\n", + " type_h = tf.nn.elu(self.stackpointer.type_h(output_dec))\n", + " out_arc = self.stackpointer.attention.forward(arc_h, tf.expand_dims(self.arc_c, 0))\n", + " out_arc = tf.squeeze(tf.squeeze(out_arc, axis = 1), axis = 1)\n", + " self.hyp_scores = tf.nn.log_softmax(out_arc, axis = 1)\n", + " self.type_h = type_h\n", + " self.decode_hidden = hx\n", + " \n", + " self.holder_type_h = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n", + " self.holder_type_c = tf.placeholder(tf.float32, (None, self.stackpointer.arc_space))\n", + " \n", + " out_type = self.stackpointer.bilinear.forward(self.holder_type_h, self.holder_type_c)\n", + " self.hyp_type_scores = tf.nn.log_softmax(out_type, axis = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :73: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n", + "WARNING:tensorflow:From :83: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n", + "WARNING:tensorflow:: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.\n", + "WARNING:tensorflow:From :111: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "(?, ?, ?) (?, ?, 52)\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = words_train\n", + "train_Y = labels_train\n", + "train_depends = depends_train\n", + "train_char = sentences_train\n", + "\n", + "test_X = words_test\n", + "test_Y = labels_test\n", + "test_depends = depends_test\n", + "test_char = sentences_test" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "prior_order = model.stackpointer.prior_order\n", + "\n", + "def decode_sentence(output_enc, arc_c, type_c, hx, beam, length, ordered, leading_symbolic):\n", + " def valid_hyp(base_id, child_id, head):\n", + " if constraints[base_id, child_id]:\n", + " return False\n", + " elif not ordered or prior_order == PriorOrder.DEPTH or child_orders[base_id, head] == 0:\n", + " return True\n", + " elif prior_order == PriorOrder.LEFT2RIGTH:\n", + " return child_id > child_orders[base_id, head]\n", + " else:\n", + " if child_id < head:\n", + " return child_id < child_orders[base_id, head] < head\n", + " else:\n", + " return child_id > child_orders[base_id, head]\n", + " \n", + " length = output_enc.shape[0] if length is None else length\n", + " \n", + " stacked_heads = [[0] for _ in range(beam)]\n", + " grand_parents = [[0] for _ in range(beam)]\n", + " siblings = [[0] for _ in range(beam)]\n", + " children = np.zeros((beam, 2 * length - 1))\n", + " stacked_types = np.zeros((beam, 2 * length - 1))\n", + " \n", + " children = np.zeros((beam, 2 * length - 1))\n", + " stacked_types = np.zeros((beam, 2 * length - 1))\n", + " hypothesis_scores = [0]\n", + " constraints = np.zeros([beam, length], dtype=np.bool)\n", + " constraints[:, 0] = True\n", + " child_orders = np.zeros([beam, length], dtype=np.int64)\n", + "\n", + " new_stacked_heads = [[] for _ in range(beam)]\n", + " new_grand_parents = [[] for _ in range(beam)]\n", + " new_siblings = [[] for _ in range(beam)]\n", + " new_skip_connects = [[] for _ in range(beam)]\n", + " new_children = np.zeros((beam, 2 * length - 1))\n", + " new_stacked_types = np.zeros((beam, 2 * length - 1))\n", + " num_hyp = 1\n", + " num_step = 2 * length - 1\n", + " for t in range(num_step):\n", + " heads = np.array([stacked_heads[i][-1] for i in range(num_hyp)])\n", + " gpars = np.array([grand_parents[i][-1] for i in range(num_hyp)])\n", + " sibs = np.array([siblings[i].pop() for i in range(num_hyp)])\n", + " src_encoding = output_enc[heads]\n", + " mask_sibs = np.expand_dims((np.array(sibs) != 0).astype(np.float32), axis = 1)\n", + " output_enc_sibling = output_enc[sibs] * mask_sibs\n", + " src_encoding = src_encoding + output_enc_sibling\n", + " output_enc_gpar = output_enc[gpars]\n", + " src_encoding = src_encoding + output_enc_gpar\n", + " hyp_scores, type_h, hx = sess.run([model.hyp_scores, model.type_h, model.decode_hidden],\n", + " feed_dict = {model.src_encoding: src_encoding,\n", + " model.arc_c: arc_c,\n", + " model.hx: hx})\n", + " \n", + " new_hypothesis_scores = np.expand_dims(hypothesis_scores[:num_hyp], axis = 1) + hyp_scores\n", + " new_hypothesis_scores = new_hypothesis_scores.reshape((-1))\n", + " hyp_index = np.argsort(new_hypothesis_scores)[::-1]\n", + " new_hypothesis_scores = np.sort(new_hypothesis_scores)[::-1]\n", + " base_index = (hyp_index // length)\n", + " child_index = hyp_index % length\n", + " cc = 0\n", + " ids = []\n", + " new_constraints = np.zeros([beam, length], dtype=np.bool)\n", + " new_child_orders = np.zeros([beam, length], dtype=np.int64)\n", + " for id_ in range(num_hyp * length):\n", + " base_id = base_index[id_]\n", + " if base_id:\n", + " ids.append(id_)\n", + " continue\n", + " child_id = child_index[id_]\n", + " head = heads[base_id]\n", + " new_hyp_score = new_hypothesis_scores[id_]\n", + " if child_id == head:\n", + " if head != 0 or t + 1 == num_step:\n", + " new_constraints[cc] = constraints[base_id]\n", + " new_child_orders[cc] = child_orders[base_id]\n", + "\n", + " new_stacked_heads[cc] = [stacked_heads[base_id][i] for i in range(len(stacked_heads[base_id]))]\n", + " new_stacked_heads[cc].pop()\n", + "\n", + " new_grand_parents[cc] = [grand_parents[base_id][i] for i in range(len(grand_parents[base_id]))]\n", + " new_grand_parents[cc].pop()\n", + "\n", + " new_siblings[cc] = [siblings[base_id][i] for i in range(len(siblings[base_id]))]\n", + "\n", + " new_children[cc] = children[base_id]\n", + " new_children[cc, t] = child_id\n", + "\n", + " hypothesis_scores[cc] = new_hyp_score\n", + " ids.append(id_)\n", + " cc += 1\n", + " elif valid_hyp(base_id, child_id, head):\n", + " new_constraints[cc] = constraints[base_id]\n", + " new_constraints[cc, child_id] = True\n", + "\n", + " new_child_orders[cc] = child_orders[base_id]\n", + " new_child_orders[cc, head] = child_id\n", + "\n", + " new_stacked_heads[cc] = [stacked_heads[base_id][i] for i in range(len(stacked_heads[base_id]))]\n", + " new_stacked_heads[cc].append(child_id)\n", + "\n", + " new_grand_parents[cc] = [grand_parents[base_id][i] for i in range(len(grand_parents[base_id]))]\n", + " new_grand_parents[cc].append(head)\n", + "\n", + " new_siblings[cc] = [siblings[base_id][i] for i in range(len(siblings[base_id]))]\n", + " new_siblings[cc].append(child_id)\n", + " new_siblings[cc].append(0)\n", + "\n", + " new_children[cc] = children[base_id]\n", + " new_children[cc, t] = child_id\n", + "\n", + " hypothesis_scores[cc] = new_hyp_score\n", + " ids.append(id_)\n", + " cc += 1\n", + " \n", + " if cc == beam:\n", + " break\n", + " \n", + " num_hyp = len(ids)\n", + " if num_hyp == 0:\n", + " return None\n", + " else:\n", + " index = np.array(ids)\n", + " base_index = base_index[index]\n", + " child_index = child_index[index]\n", + " hyp_type_scores = sess.run(model.hyp_type_scores,\n", + " feed_dict = {\n", + " model.holder_type_h: type_h[base_index],\n", + " model.holder_type_c: type_c[child_index]\n", + " })\n", + " hyp_types = np.argmax(hyp_type_scores, axis = 1)\n", + " hyp_type_scores = np.max(hyp_type_scores, axis = 1)\n", + " hypothesis_scores[:num_hyp] = hypothesis_scores[:num_hyp] + hyp_type_scores\n", + "\n", + " for i in range(num_hyp):\n", + " base_id = base_index[i]\n", + " new_stacked_types[i] = stacked_types[base_id]\n", + " new_stacked_types[i, t] = hyp_types[i]\n", + "\n", + " stacked_heads = [[new_stacked_heads[i][j] for j in range(len(new_stacked_heads[i]))] for i in range(num_hyp)]\n", + " grand_parents = [[new_grand_parents[i][j] for j in range(len(new_grand_parents[i]))] for i in range(num_hyp)]\n", + " siblings = [[new_siblings[i][j] for j in range(len(new_siblings[i]))] for i in range(num_hyp)]\n", + " constraints = new_constraints\n", + " child_orders = new_child_orders\n", + " children = np.copy(new_children)\n", + " stacked_types = np.copy(new_stacked_types)\n", + " \n", + " children = children[0].astype(np.int32)\n", + " stacked_types = stacked_types[0].astype(np.int32)\n", + " heads = np.zeros(length, dtype=np.int32)\n", + " types = np.zeros(length, dtype=np.int32)\n", + " stack = [0]\n", + " for i in range(num_step):\n", + " head = stack[-1]\n", + " child = children[i]\n", + " type_ = stacked_types[i]\n", + " if child != head:\n", + " heads[child] = head\n", + " types[child] = type_\n", + " stack.append(child)\n", + " else:\n", + " stacked_types[i] = 0\n", + " stack.pop()\n", + "\n", + " return heads, types, length, children, stacked_types \n", + " \n", + "def decode(input_word, input_char, length = None, beam = 1, leading_symbolic=0, ordered=True):\n", + " \n", + " arc_c, type_c, output, hn = sess.run([model.encode_arc_c, model.type_c, \n", + " model.encode_output, model.encode_hidden],\n", + " feed_dict = {model.words: input_word, model.chars: input_char})\n", + " batch, max_len_e, _ = output.shape\n", + "\n", + " heads = np.zeros([batch, max_len_e], dtype=np.int32)\n", + " types = np.zeros([batch, max_len_e], dtype=np.int32)\n", + "\n", + " children = np.zeros([batch, 2 * max_len_e - 1], dtype=np.int32)\n", + " stack_types = np.zeros([batch, 2 * max_len_e - 1], dtype=np.int32)\n", + " \n", + " for b in range(batch):\n", + " sent_len = None if length is None else length[b]\n", + " preds = decode_sentence(output[b], arc_c[b], type_c[b], [hn[b]], \n", + " beam, sent_len, ordered, leading_symbolic)\n", + " if preds is None:\n", + " preds = decode_sentence(output[b], arc_c[b], type_c[b], [hn[b]], beam, \n", + " sent_len, False, leading_symbolic)\n", + " hids, tids, sent_len, chids, stids = preds\n", + " heads[b, :sent_len] = hids\n", + " types[b, :sent_len] = tids\n", + "\n", + " children[b, :2 * sent_len - 1] = chids\n", + " stack_types[b, :2 * sent_len - 1] = stids\n", + "\n", + " return heads, types, children, stack_types" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_char_seq(batch, UNK = 2):\n", + " maxlen_c = max([len(k) for k in batch])\n", + " x = [[len(i) for i in k] for k in batch]\n", + " maxlen = max([j for i in x for j in i])\n", + " temp = np.zeros((len(batch),maxlen_c,maxlen),dtype=np.int32)\n", + " for i in range(len(batch)):\n", + " for k in range(len(batch[i])):\n", + " for no, c in enumerate(batch[i][k]):\n", + " temp[i,k,-1-no] = char2idx.get(c, UNK)\n", + " return temp" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((5, 37), (5, 73))" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_char = train_char[:5]\n", + "batch_char = generate_char_seq(batch_char)\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')\n", + "\n", + "batch_stacked_heads = stacked_heads_train[:5]\n", + "batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n", + "batch_children = children_train[:5]\n", + "batch_children = pad_sequences(batch_children,padding='post')\n", + "batch_siblings = siblings_train[:5]\n", + "batch_siblings = pad_sequences(batch_siblings,padding='post')\n", + "batch_stacked_types = stacked_types_train[:5]\n", + "batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n", + "batch_e = np.zeros(batch_x.shape)\n", + "batch_d = np.zeros(batch_stacked_heads.shape)\n", + "nonzero = np.count_nonzero(batch_x, axis = 1)\n", + "\n", + "for no, i in enumerate(nonzero):\n", + " batch_e[no,:i] = 1.0\n", + "for no, i in enumerate(nonzero * 2 - 1):\n", + " batch_d[no,:i] = 1.0\n", + " \n", + "batch_x.shape, batch_stacked_heads.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "14.264593" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "feed_dict = {model.words: batch_x,\n", + " model.chars: batch_char,\n", + " model.heads: batch_depends,\n", + " model.stacked_heads: batch_stacked_heads,\n", + " model.childrens: batch_children,\n", + " model.siblings: batch_siblings,\n", + " model.stacked_types: batch_stacked_types,\n", + " model.mask_e: batch_e,\n", + " model.mask_d: batch_d}\n", + "sess.run(model.cost, feed_dict = feed_dict)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 2.27 s, sys: 251 ms, total: 2.52 s\n", + "Wall time: 1.32 s\n" + ] + }, + { + "data": { + "text/plain": [ + "(array([[ 0, 0, 1, 0, 1, 6, 1, 1, 7, 0, 0, 12, 0, 0, 15, 8,\n", + " 18, 18, 7, 21, 21, 18, 23, 21, 1, 28, 28, 28, 21, 1, 34, 34,\n", + " 31, 34, 0, 34, 0],\n", + " [ 0, 10, 3, 10, 7, 7, 7, 3, 10, 10, 0, 10, 10, 14, 10, 16,\n", + " 14, 10, 10, 23, 0, 0, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36,\n", + " 36, 36, 36, 36, 21],\n", + " [ 0, 0, 1, 4, 5, 1, 9, 9, 9, 5, 9, 13, 13, 9, 13, 16,\n", + " 14, 1, 0, 0, 25, 0, 0, 25, 25, 22, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0],\n", + " [ 0, 6, 3, 1, 6, 6, 0, 9, 9, 6, 12, 12, 9, 15, 15, 12,\n", + " 6, 0, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0],\n", + " [ 0, 2, 6, 4, 2, 6, 0, 10, 10, 10, 6, 6, 16, 16, 16, 17,\n", + " 6, 16, 17, 18, 18, 22, 17, 27, 27, 27, 27, 22, 31, 31, 31, 27,\n", + " 35, 35, 35, 22, 6]], dtype=int32),\n", + " array([[ 0, 5, 7, 8, 7, 13, 35, 28, 10, 8, 44, 7, 38, 7, 3, 35,\n", + " 2, 3, 4, 2, 3, 14, 2, 14, 7, 2, 3, 13, 14, 7, 7, 7,\n", + " 0, 7, 23, 23, 45],\n", + " [ 0, 7, 3, 6, 2, 3, 13, 14, 18, 18, 5, 10, 10, 2, 4, 11,\n", + " 20, 7, 7, 0, 16, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 4],\n", + " [ 0, 5, 7, 13, 6, 28, 11, 6, 18, 27, 36, 9, 13, 10, 20, 2,\n", + " 4, 7, 7, 23, 0, 23, 23, 0, 0, 4, 23, 23, 23, 23, 23, 23,\n", + " 23, 23, 23, 23, 45],\n", + " [ 0, 6, 2, 14, 18, 19, 5, 2, 9, 4, 2, 3, 14, 2, 3, 14,\n", + " 7, 28, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n", + " 4, 23, 23, 23, 4],\n", + " [ 0, 3, 6, 2, 14, 25, 5, 2, 3, 15, 4, 7, 16, 6, 18, 0,\n", + " 28, 12, 6, 8, 0, 16, 27, 2, 3, 13, 15, 10, 2, 3, 15, 14,\n", + " 2, 2, 3, 4, 7]], dtype=int32),\n", + " array([[ 1, 7, 18, 21, 23, 22, 22, 23, 28, 25, 25, 26, 26, 27, 27, 28,\n", + " 19, 19, 20, 20, 21, 16, 16, 17, 17, 18, 8, 15, 14, 14, 15, 8,\n", + " 7, 6, 5, 5, 6, 2, 2, 4, 4, 24, 24, 29, 29, 1, 36, 36,\n", + " 34, 35, 35, 30, 30, 31, 32, 32, 31, 33, 33, 34, 3, 3, 10, 10,\n", + " 12, 11, 11, 12, 9, 9, 13, 13, 0],\n", + " [10, 3, 7, 4, 4, 5, 5, 6, 6, 7, 2, 2, 3, 14, 16, 15,\n", + " 15, 16, 13, 13, 14, 1, 1, 8, 8, 9, 9, 11, 11, 12, 12, 17,\n", + " 17, 18, 18, 10, 20, 20, 21, 36, 22, 22, 23, 19, 19, 23, 24, 24,\n", + " 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32,\n", + " 33, 33, 34, 34, 35, 35, 36, 21, 0],\n", + " [ 1, 5, 9, 13, 14, 16, 15, 15, 16, 14, 11, 11, 12, 12, 13, 6,\n", + " 6, 7, 7, 8, 8, 10, 10, 9, 4, 3, 3, 4, 5, 2, 2, 17,\n", + " 17, 1, 36, 36, 35, 35, 34, 34, 33, 33, 32, 32, 31, 31, 30, 30,\n", + " 29, 29, 28, 28, 27, 27, 26, 26, 21, 21, 22, 25, 23, 23, 24, 24,\n", + " 20, 20, 25, 22, 19, 19, 18, 18, 0],\n", + " [ 6, 9, 12, 15, 13, 13, 14, 14, 15, 10, 10, 11, 11, 12, 7, 7,\n", + " 8, 8, 9, 1, 3, 2, 2, 3, 1, 4, 4, 5, 5, 16, 16, 6,\n", + " 36, 36, 35, 35, 34, 34, 33, 33, 32, 32, 31, 31, 30, 30, 29, 29,\n", + " 28, 28, 27, 27, 26, 26, 25, 25, 24, 24, 23, 23, 22, 22, 21, 21,\n", + " 20, 20, 19, 19, 17, 18, 18, 17, 0],\n", + " [ 6, 16, 17, 22, 27, 31, 28, 28, 29, 29, 30, 30, 31, 23, 23, 24,\n", + " 24, 25, 25, 26, 26, 27, 35, 32, 32, 33, 33, 34, 34, 35, 21, 21,\n", + " 22, 18, 19, 19, 20, 20, 18, 15, 15, 17, 12, 12, 13, 13, 14, 14,\n", + " 16, 2, 4, 3, 3, 4, 1, 1, 2, 10, 7, 7, 8, 8, 9, 9,\n", + " 10, 5, 5, 11, 11, 36, 36, 6, 0]], dtype=int32),\n", + " array([[ 5, 28, 4, 14, 14, 2, 0, 0, 14, 2, 0, 3, 0, 13, 0, 0,\n", + " 2, 0, 3, 0, 0, 2, 0, 3, 0, 0, 10, 35, 3, 0, 0, 0,\n", + " 0, 35, 13, 0, 0, 7, 0, 7, 0, 7, 0, 7, 0, 0, 45, 0,\n", + " 23, 23, 0, 7, 0, 7, 0, 0, 0, 7, 0, 0, 8, 0, 44, 0,\n", + " 38, 7, 0, 0, 8, 0, 7, 0, 0],\n", + " [ 5, 6, 14, 2, 0, 3, 0, 13, 0, 0, 3, 0, 0, 4, 20, 11,\n", + " 0, 0, 2, 0, 0, 7, 0, 18, 0, 18, 0, 10, 0, 10, 0, 7,\n", + " 0, 7, 0, 0, 16, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 0, 0, 0, 0, 0],\n", + " [ 5, 28, 27, 10, 20, 4, 2, 0, 0, 0, 9, 0, 13, 0, 0, 11,\n", + " 0, 6, 0, 18, 0, 36, 0, 0, 6, 13, 0, 0, 0, 7, 0, 7,\n", + " 0, 0, 45, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 0,\n", + " 23, 0, 23, 0, 23, 0, 23, 0, 23, 0, 23, 4, 0, 0, 0, 0,\n", + " 0, 0, 0, 0, 23, 0, 7, 0, 0],\n", + " [ 5, 4, 14, 14, 2, 0, 3, 0, 0, 2, 0, 3, 0, 0, 2, 0,\n", + " 9, 0, 0, 6, 14, 2, 0, 0, 0, 18, 0, 19, 0, 7, 0, 0,\n", + " 4, 0, 23, 0, 23, 0, 23, 0, 4, 0, 4, 0, 4, 0, 4, 0,\n", + " 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0, 4, 0,\n", + " 4, 0, 4, 0, 28, 4, 0, 0, 0],\n", + " [ 5, 28, 12, 27, 10, 14, 2, 0, 3, 0, 15, 0, 0, 2, 0, 3,\n", + " 0, 13, 0, 15, 0, 0, 4, 2, 0, 2, 0, 3, 0, 0, 16, 0,\n", + " 0, 6, 8, 0, 0, 0, 0, 0, 0, 0, 16, 0, 6, 0, 18, 0,\n", + " 0, 6, 14, 2, 0, 0, 3, 0, 0, 4, 2, 0, 3, 0, 15, 0,\n", + " 0, 25, 0, 7, 0, 7, 0, 0, 0]], dtype=int32))" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%time\n", + "decode(batch_x, batch_char)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=2.97]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.70it/s, cost=12.1]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:01, 6.08it/s, cost=3.28]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 0, training loss: 5.157737, valid loss: 11.861909\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=2.01]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.53it/s, cost=13.8]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:07, 5.52it/s, cost=2.31]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1, training loss: 2.576627, valid loss: 13.340673\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=1.55] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.33it/s, cost=15.4]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:02, 5.99it/s, cost=1.77]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 2, training loss: 1.922838, valid loss: 14.725556\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=1.36] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.20it/s, cost=16.4]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.70it/s, cost=1.47]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 3, training loss: 1.529883, valid loss: 15.789502\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=1.12] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.20it/s, cost=17.9]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.88it/s, cost=1.2]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 4, training loss: 1.266019, valid loss: 17.307760\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=1.02] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.87it/s, cost=19.5]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.93it/s, cost=1.06]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 5, training loss: 1.066313, valid loss: 19.008535\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=0.878]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.69it/s, cost=21.3]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:02, 6.03it/s, cost=0.895]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 6, training loss: 0.908035, valid loss: 20.994354\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.13it/s, cost=0.748]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.86it/s, cost=22.6]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:02, 6.03it/s, cost=0.771]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 7, training loss: 0.780265, valid loss: 22.426714\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.11it/s, cost=0.636]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.22it/s, cost=24.4]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:03, 5.93it/s, cost=0.615]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 8, training loss: 0.687402, valid loss: 24.419289\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=0.628]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.42it/s, cost=26.8]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:05, 5.75it/s, cost=0.546]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 9, training loss: 0.609938, valid loss: 26.764641\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.18it/s, cost=0.613]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 17.83it/s, cost=28.2]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.65it/s, cost=0.52]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 10, training loss: 0.525183, valid loss: 28.478970\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.15it/s, cost=0.538]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.67it/s, cost=31.2]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.62it/s, cost=0.484]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 11, training loss: 0.459827, valid loss: 31.322876\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.17it/s, cost=0.512] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 19.11it/s, cost=32.4]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:06, 5.59it/s, cost=0.367]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 12, training loss: 0.400364, valid loss: 33.366253\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:01<00:00, 6.14it/s, cost=0.413] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.76it/s, cost=34.1]\n", + "train minibatch loop: 0%| | 1/375 [00:00<01:02, 5.95it/s, cost=0.316]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 13, training loss: 0.357156, valid loss: 34.881569\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:00<00:00, 6.16it/s, cost=0.331] \n", + "test minibatch loop: 100%|██████████| 120/120 [00:06<00:00, 18.91it/s, cost=36.8]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 14, training loss: 0.307119, valid loss: 37.149876\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "batch_size = 32\n", + "epoch = 15\n", + "\n", + "for e in range(epoch):\n", + " test_loss, train_loss = [], []\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'train minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_X))\n", + " batch_x = train_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_char = train_char[i: index]\n", + " batch_char = generate_char_seq(batch_char)\n", + " batch_y = train_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = train_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + "\n", + " batch_stacked_heads = stacked_heads_train[i: index]\n", + " batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n", + " batch_children = children_train[i: index]\n", + " batch_children = pad_sequences(batch_children,padding='post')\n", + " batch_siblings = siblings_train[i: index]\n", + " batch_siblings = pad_sequences(batch_siblings,padding='post')\n", + " batch_stacked_types = stacked_types_train[i: index]\n", + " batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n", + " batch_e = np.zeros(batch_x.shape)\n", + " batch_d = np.zeros(batch_stacked_heads.shape)\n", + " nonzero = np.count_nonzero(batch_x, axis = 1)\n", + "\n", + " for no, i in enumerate(nonzero):\n", + " batch_e[no,:i] = 1.0\n", + " for no, i in enumerate(nonzero * 2 - 1):\n", + " batch_d[no,:i] = 1.0\n", + " \n", + " feed_dict = {model.words: batch_x,\n", + " model.chars: batch_char,\n", + " model.heads: batch_depends,\n", + " model.stacked_heads: batch_stacked_heads,\n", + " model.childrens: batch_children,\n", + " model.siblings: batch_siblings,\n", + " model.stacked_types: batch_stacked_types,\n", + " model.mask_e: batch_e,\n", + " model.mask_d: batch_d}\n", + " cost, _ = sess.run([model.cost, model.optimizer], feed_dict = feed_dict)\n", + " train_loss.append(cost)\n", + " pbar.set_postfix(cost = cost)\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'test minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_char = test_char[i: index]\n", + " batch_char = generate_char_seq(batch_char)\n", + " batch_y = test_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = test_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + "\n", + " batch_stacked_heads = stacked_heads_test[i: index]\n", + " batch_stacked_heads = pad_sequences(batch_stacked_heads,padding='post')\n", + " batch_children = children_test[i: index]\n", + " batch_children = pad_sequences(batch_children,padding='post')\n", + " batch_siblings = siblings_test[i: index]\n", + " batch_siblings = pad_sequences(batch_siblings,padding='post')\n", + " batch_stacked_types = stacked_types_test[i: index]\n", + " batch_stacked_types = pad_sequences(batch_stacked_types,padding='post')\n", + " batch_e = np.zeros(batch_x.shape)\n", + " batch_d = np.zeros(batch_stacked_heads.shape)\n", + " nonzero = np.count_nonzero(batch_x, axis = 1)\n", + "\n", + " for no, i in enumerate(nonzero):\n", + " batch_e[no,:i] = 1.0\n", + " for no, i in enumerate(nonzero * 2 - 1):\n", + " batch_d[no,:i] = 1.0\n", + " \n", + " feed_dict = {model.words: batch_x,\n", + " model.chars: batch_char,\n", + " model.heads: batch_depends,\n", + " model.stacked_heads: batch_stacked_heads,\n", + " model.childrens: batch_children,\n", + " model.siblings: batch_siblings,\n", + " model.stacked_types: batch_stacked_types,\n", + " model.mask_e: batch_e,\n", + " model.mask_d: batch_d}\n", + " cost = sess.run(model.cost, feed_dict = feed_dict)\n", + " test_loss.append(cost)\n", + " pbar.set_postfix(cost = cost)\n", + " \n", + " print(\n", + " 'epoch: %d, training loss: %f, valid loss: %f\\n'\n", + " % (e, np.mean(train_loss), np.mean(test_loss)))\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "def evaluate(heads_pred, types_pred, heads, types, lengths,\n", + " symbolic_root=False, symbolic_end=False):\n", + " batch_size, _ = heads_pred.shape\n", + " ucorr = 0.\n", + " lcorr = 0.\n", + " total = 0.\n", + " ucomplete_match = 0.\n", + " lcomplete_match = 0.\n", + "\n", + " corr_root = 0.\n", + " total_root = 0.\n", + " start = 1 if symbolic_root else 0\n", + " end = 1 if symbolic_end else 0\n", + " for i in range(batch_size):\n", + " ucm = 1.\n", + " lcm = 1.\n", + " for j in range(start, lengths[i] - end):\n", + "\n", + " total += 1\n", + " if heads[i, j] == heads_pred[i, j]:\n", + " ucorr += 1\n", + " if types[i, j] == types_pred[i, j]:\n", + " lcorr += 1\n", + " else:\n", + " lcm = 0\n", + " else:\n", + " ucm = 0\n", + " lcm = 0\n", + "\n", + " if heads[i, j] == 0:\n", + " total_root += 1\n", + " corr_root += 1 if heads_pred[i, j] == 0 else 0\n", + "\n", + " ucomplete_match += ucm\n", + " lcomplete_match += lcm\n", + " \n", + " return ucorr / total, lcorr / total, corr_root / total_root" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.6045627376425855, 0.5209125475285171, 0.90625)" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "heads, types, _, _ = decode(batch_x, batch_char)\n", + "arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, types, batch_depends, batch_y, \n", + " np.count_nonzero(batch_x, axis = 1))\n", + "arc_accuracy, type_accuracy, root_accuracy" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "arcs, types, roots = [], [], []\n", + "\n", + "for i in range(0, len(test_X), 5):\n", + " index = min(i + 5, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_char = test_char[i: index]\n", + " batch_char = generate_char_seq(batch_char)\n", + " batch_y = test_Y[i: index]\n", + " batch_y = pad_sequences(batch_y,padding='post')\n", + " batch_depends = test_depends[i: index]\n", + " batch_depends = pad_sequences(batch_depends,padding='post')\n", + " \n", + " heads, tags_seq, _, _ = decode(batch_x, batch_char)\n", + " \n", + " arc_accuracy, type_accuracy, root_accuracy = evaluate(heads, tags_seq, batch_depends, batch_y, \n", + " np.count_nonzero(batch_x, axis = 1))\n", + " arcs.append(arc_accuracy)\n", + " types.append(type_accuracy)\n", + " roots.append(root_accuracy)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "arc accuracy: 0.6188156085110088\n", + "types accuracy: 0.482035002661857\n", + "root accuracy: 0.8939869281045753\n" + ] + } + ], + "source": [ + "print('arc accuracy:', np.mean(arcs))\n", + "print('types accuracy:', np.mean(types))\n", + "print('root accuracy:', np.mean(roots))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb b/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb new file mode 100644 index 0000000..dc02b25 --- /dev/null +++ b/dependency-parser/8.xlnet-biaffine-attention-cross-entropy.ipynb @@ -0,0 +1,1608 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-train.conllu\n", + "# !wget https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-test.conllu\n", + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n", + "# !unzip xlnet.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "tag2idx = {'PAD': 0, 'X': 1}\n", + "tag_idx = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import sentencepiece as spm\n", + "from prepro_utils import preprocess_text, encode_ids\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "\n", + "def tokenize_fn(text):\n", + " text = preprocess_text(text, lower= False)\n", + " return encode_ids(sp_model, text)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " \"\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def process_corpus(corpus, until = None):\n", + " global word2idx, tag2idx, char2idx, word_idx, tag_idx, char_idx\n", + " sentences, words, depends, labels, pos, sequences = [], [], [], [], [], []\n", + " temp_sentence, temp_word, temp_depend, temp_label, temp_pos = [], [], [], [], []\n", + " segments, masks = [], []\n", + " first_time = True\n", + " for sentence in corpus:\n", + " try:\n", + " if len(sentence):\n", + " if sentence[0] == '#':\n", + " continue\n", + " if first_time:\n", + " print(sentence)\n", + " first_time = False\n", + " sentence = sentence.split('\\t')\n", + " if sentence[7] not in tag2idx:\n", + " tag2idx[sentence[7]] = tag_idx\n", + " tag_idx += 1\n", + " temp_word.append(sentence[1])\n", + " temp_depend.append(int(sentence[6]) + 1)\n", + " temp_label.append(tag2idx[sentence[7]])\n", + " temp_sentence.append(sentence[1])\n", + " temp_pos.append(sentence[3])\n", + " else:\n", + " if len(temp_sentence) < 2 or len(temp_word) != len(temp_label):\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " continue\n", + " bert_tokens = []\n", + " labels_ = []\n", + " depends_ = []\n", + " seq_ = []\n", + " for no, orig_token in enumerate(temp_word):\n", + " t = tokenize_fn(orig_token)\n", + " labels_.append(temp_label[no])\n", + " depends_.append(temp_depend[no])\n", + " bert_tokens.extend(t)\n", + " labels_.extend([1] * (len(t) - 1))\n", + " depends_.extend([0] * (len(t) - 1))\n", + " seq_.append(no + 1)\n", + " bert_tokens.extend([4, 3])\n", + " labels_.extend([0, 0])\n", + " depends_.extend([0, 0])\n", + " segment = [0] * (len(bert_tokens) - 1) + [SEG_ID_CLS]\n", + " input_mask = [0] * len(segment)\n", + " words.append(bert_tokens)\n", + " depends.append(depends_)\n", + " labels.append(labels_)\n", + " sentences.append(temp_sentence)\n", + " pos.append(temp_pos)\n", + " sequences.append(seq_)\n", + " segments.append(segment)\n", + " masks.append(input_mask)\n", + " temp_word = []\n", + " temp_depend = []\n", + " temp_label = []\n", + " temp_sentence = []\n", + " temp_pos = []\n", + " except Exception as e:\n", + " print(e, sentence)\n", + " return sentences[:-1], words[:-1], depends[:-1], labels[:-1], pos[:-1], sequences[:-1], segments[:-1], masks[:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tFrom\tfrom\tADP\tIN\t_\t3\tcase\t3:case\t_\n", + "invalid literal for int() with base 10: '_' ['10.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '8:parataxis', 'CopyOf=-1']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'has', 'have', 'VERB', 'VBZ', '_', '_', '_', '16:conj:and', 'CopyOf=-1']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-dev.conllu') as fopen:\n", + " dev = fopen.read().split('\\n')\n", + "\n", + "sentences_dev, words_dev, depends_dev, labels_dev, _, seq_dev, segments_dev, masks_dev = process_corpus(dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tWhat\twhat\tPRON\tWP\tPronType=Int\t0\troot\t0:root\t_\n", + "invalid literal for int() with base 10: '_' ['24.1', 'left', 'left', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '6:parataxis', 'CopyOf=6']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-test.conllu') as fopen:\n", + " test = fopen.read().split('\\n')\n", + "\n", + "sentences_test, words_test, depends_test, labels_test, _, seq_test, segments_test, masks_test = process_corpus(test)\n", + "sentences_test.extend(sentences_dev)\n", + "words_test.extend(words_dev)\n", + "depends_test.extend(depends_dev)\n", + "labels_test.extend(labels_dev)\n", + "seq_test.extend(seq_dev)\n", + "segments_test.extend(segments_dev)\n", + "masks_test.extend(masks_dev)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\tAl\tAl\tPROPN\tNNP\tNumber=Sing\t0\troot\t0:root\tSpaceAfter=No\n", + "invalid literal for int() with base 10: '_' ['8.1', 'reported', 'report', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '5:conj:and', 'CopyOf=5']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'used', 'use', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part', '_', '_', '13:advcl:with|17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'called', 'call', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['14.1', 'is', 'be', 'VERB', 'VBZ', '_', '_', '_', '1:conj:and', 'CopyOf=1']\n", + "invalid literal for int() with base 10: '_' ['20.1', 'reflect', 'reflect', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '7:acl:relcl|9:conj', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'recruited', 'recruit', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '9:conj:and', 'CopyOf=9']\n", + "invalid literal for int() with base 10: '_' ['9.1', 'wish', 'wish', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['38.1', 'supplied', 'supply', 'VERB', 'VBN', 'Tense=Past|VerbForm=Part|Voice=Pass', '_', '_', '16:conj:and', 'CopyOf=16']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['21.1', 'keep', 'keep', 'VERB', 'VB', 'Mood=Imp|VerbForm=Fin', '_', '_', '14:conj:and', 'CopyOf=14']\n", + "invalid literal for int() with base 10: '_' ['18.1', 'mean', 'mean', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '8:conj', 'CopyOf=8']\n", + "invalid literal for int() with base 10: '_' ['30.1', 'play', 'play', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '18:acl:relcl|27:conj:but', 'CopyOf=27']\n", + "invalid literal for int() with base 10: '_' ['22.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['27.1', 'have', 'have', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '17:conj', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['49.1', 'helped', 'help', 'VERB', 'VBD', '_', '_', '_', '38:conj:but', 'CopyOf=38']\n", + "invalid literal for int() with base 10: '_' ['7.1', 'found', 'find', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'excited', 'excited', 'ADJ', 'JJ', 'Degree=Pos', '_', '_', '4:advcl', 'CopyOf=4']\n", + "invalid literal for int() with base 10: '_' ['15.1', \"'s\", 'be', 'VERB', 'VBZ', '_', '_', '_', '2:conj:and', 'CopyOf=2']\n", + "invalid literal for int() with base 10: '_' ['25.1', 'took', 'take', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '17:conj:and', 'CopyOf=17']\n", + "invalid literal for int() with base 10: '_' ['10.1', 'loss', 'lose', 'VERB', 'VBD', 'Mood=Ind|Tense=Past|VerbForm=Fin', '_', '_', '3:conj:and', 'CopyOf=3']\n", + "invalid literal for int() with base 10: '_' ['11.1', 'leave', 'leave', 'VERB', 'VB', 'VerbForm=Inf', '_', '_', '7:parataxis', 'CopyOf=7']\n", + "invalid literal for int() with base 10: '_' ['24.1', 'charge', 'charge', 'VERB', 'VBP', 'Mood=Ind|Tense=Pres|VerbForm=Fin', '_', '_', '16:conj:and', 'CopyOf=16']\n" + ] + } + ], + "source": [ + "with open('en_ewt-ud-train.conllu') as fopen:\n", + " train = fopen.read().split('\\n')\n", + "\n", + "sentences_train, words_train, depends_train, labels_train, _, _, segments_train, masks_train = process_corpus(train)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(12000, 3824)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sentences_train), len(sentences_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "idx2tag = {v:k for k, v in tag2idx.items()}" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = words_train\n", + "train_Y = labels_train\n", + "train_depends = depends_train\n", + "\n", + "test_X = words_test\n", + "test_Y = labels_test\n", + "test_depends = depends_test" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/usr/lib/python3/dist-packages/requests/__init__.py:80: RequestsDependencyWarning: urllib3 (1.25.6) or chardet (3.0.4) doesn't match a supported version!\n", + " RequestsDependencyWarning)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/testing/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:70: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import xlnet\n", + "import model_utils\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.1,\n", + " dropatt=0.1,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "5625 562\n" + ] + } + ], + "source": [ + "epoch = 15\n", + "batch_size = 32\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(train_X) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)\n", + "print(num_train_steps, num_warmup_steps)\n", + "\n", + "training_parameters = dict(\n", + " decay_method = 'poly',\n", + " train_steps = num_train_steps,\n", + " learning_rate = 2e-5,\n", + " warmup_steps = num_warmup_steps,\n", + " min_lr_ratio = 0.0,\n", + " weight_decay = 0.00,\n", + " adam_epsilon = 1e-8,\n", + " num_core_per_host = 1,\n", + " lr_layer_decay_rate = 1,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.02,\n", + " clip = 1.0,\n", + " clamp_len=-1,)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "class Parameter:\n", + " def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n", + " num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n", + " min_lr_ratio, clip, **kwargs):\n", + " self.decay_method = decay_method\n", + " self.warmup_steps = warmup_steps\n", + " self.weight_decay = weight_decay\n", + " self.adam_epsilon = adam_epsilon\n", + " self.num_core_per_host = num_core_per_host\n", + " self.lr_layer_decay_rate = lr_layer_decay_rate\n", + " self.use_tpu = use_tpu\n", + " self.learning_rate = learning_rate\n", + " self.train_steps = train_steps\n", + " self.min_lr_ratio = min_lr_ratio\n", + " self.clip = clip\n", + " \n", + "training_parameters = Parameter(**training_parameters)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "class BiAAttention:\n", + " def __init__(self, input_size_encoder, input_size_decoder, num_labels):\n", + " self.input_size_encoder = input_size_encoder\n", + " self.input_size_decoder = input_size_decoder\n", + " self.num_labels = num_labels\n", + " \n", + " self.W_d = tf.get_variable(\"W_d\", shape=[self.num_labels, self.input_size_decoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_e = tf.get_variable(\"W_e\", shape=[self.num_labels, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.U = tf.get_variable(\"U\", shape=[self.num_labels, self.input_size_decoder, self.input_size_encoder],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_d, input_e, mask_d=None, mask_e=None):\n", + " batch = tf.shape(input_d)[0]\n", + " length_decoder = tf.shape(input_d)[1]\n", + " length_encoder = tf.shape(input_e)[1]\n", + " out_d = tf.expand_dims(tf.matmul(self.W_d, tf.transpose(input_d, [0, 2, 1])), 3)\n", + " out_e = tf.expand_dims(tf.matmul(self.W_e, tf.transpose(input_e, [0, 2, 1])), 2)\n", + " output = tf.matmul(tf.expand_dims(input_d, 1), self.U)\n", + " output = tf.matmul(output, tf.transpose(tf.expand_dims(input_e, 1), [0, 1, 3, 2]))\n", + " \n", + " output = output + out_d + out_e\n", + " \n", + " if mask_d is not None:\n", + " d = tf.expand_dims(tf.expand_dims(mask_d, 1), 3)\n", + " e = tf.expand_dims(tf.expand_dims(mask_e, 1), 2)\n", + " output = output * d * e\n", + " \n", + " return output\n", + " \n", + "class BiLinear:\n", + " def __init__(self, left_features, right_features, out_features):\n", + " self.left_features = left_features\n", + " self.right_features = right_features\n", + " self.out_features = out_features\n", + " \n", + " self.U = tf.get_variable(\"U-bi\", shape=[out_features, left_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_l = tf.get_variable(\"Wl\", shape=[out_features, left_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " self.W_r = tf.get_variable(\"Wr\", shape=[out_features, right_features],\n", + " initializer=tf.contrib.layers.xavier_initializer())\n", + " \n", + " def forward(self, input_left, input_right):\n", + " left_size = tf.shape(input_left)\n", + " output_shape = tf.concat([left_size[:-1], [self.out_features]], axis = 0)\n", + " batch = tf.cast(tf.reduce_prod(left_size[:-1]), tf.int32)\n", + " input_left = tf.reshape(input_left, (batch, self.left_features))\n", + " input_right = tf.reshape(input_right, (batch, self.right_features))\n", + " tiled = tf.tile(tf.expand_dims(input_left, axis = 0), (self.out_features,1,1))\n", + " output = tf.transpose(tf.reduce_sum(tf.matmul(tiled, self.U), axis = 2))\n", + " output = output + tf.matmul(input_left, tf.transpose(self.W_l))\\\n", + " + tf.matmul(input_right, tf.transpose(self.W_r))\n", + " \n", + " return tf.reshape(output, output_shape)\n", + "\n", + "class Attention:\n", + " def __init__(self, word_dim, num_words, char_dim, num_chars, num_filters, kernel_size,\n", + " hidden_size, encoder_layers, num_labels, arc_space, type_space):\n", + " \n", + " def cells(size, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size,\n", + " initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.word_embedd = tf.Variable(tf.random_uniform([num_words, word_dim], -1, 1))\n", + " self.char_embedd = tf.Variable(tf.random_uniform([num_chars, char_dim], -1, 1))\n", + " self.conv1d = tf.layers.Conv1D(num_filters, kernel_size, 1, padding='VALID')\n", + " self.num_labels = num_labels\n", + " self.encoder = tf.nn.rnn_cell.MultiRNNCell([cells(hidden_size) for _ in range(encoder_layers)])\n", + "\n", + " \n", + " \n", + " def encode(self, input_word, input_char):\n", + " word = tf.nn.embedding_lookup(self.word_embedd, input_word)\n", + " char = tf.nn.embedding_lookup(self.char_embedd, input_char)\n", + " b = tf.shape(char)[0]\n", + " wl = tf.shape(char)[1]\n", + " cl = tf.shape(char)[2]\n", + " d = char.shape[3]\n", + " char = tf.reshape(char, [b * wl, cl, d])\n", + " char = tf.reduce_max(self.conv1d(char), axis = 1)\n", + " char = tf.nn.tanh(char)\n", + " d = char.shape[-1]\n", + " char = tf.reshape(char, [b, wl, d])\n", + " \n", + " src_encoding = tf.concat([word, char], axis=2)\n", + " output, hn = tf.nn.dynamic_rnn(self.encoder, src_encoding, dtype = tf.float32,\n", + " scope = 'encoder')\n", + " arc_h = tf.nn.elu(self.arc_h(output))\n", + " arc_c = tf.nn.elu(self.arc_c(output))\n", + " \n", + " type_h = tf.nn.elu(self.type_h(output))\n", + " type_c = tf.nn.elu(self.type_c(output))\n", + " \n", + " return (arc_h, arc_c), (type_h, type_c), hn\n", + " \n", + " def forward(self, input_word, input_char, mask):\n", + " arcs, types, _ = self.encode(input_word, input_char)\n", + " \n", + " out_arc = tf.squeeze(self.attention.forward(arcs[0], arcs[1], mask_d=mask, mask_e=mask), axis = 1)\n", + " return out_arc, types, mask\n", + " \n", + " def loss(self, input_word, input_char, mask, heads, types):\n", + " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n", + " type_h, type_c = out_type\n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " minus_inf = -1e8\n", + " minus_mask = (1 - mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_type = tf.nn.log_softmax(out_type, dim=2)\n", + " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n", + " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n", + " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])\n", + " \n", + " t = tf.transpose(types)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_type = tf.gather_nd(loss_type, concatenated)\n", + " loss_type = tf.transpose(loss_type, [1, 0])\n", + " return tf.reduce_sum(-loss_arc) / num, tf.reduce_sum(-loss_type) / num\n", + " \n", + " def decode(self, input_word, input_char, mask, leading_symbolic=0):\n", + " out_arc, out_type, _ = self.forward(input_word, input_char, mask)\n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " out_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " out_arc = tf.where(minus_mask, tf.fill(tf.shape(out_arc), -np.inf), out_arc)\n", + " heads = tf.argmax(out_arc, axis = 1)\n", + " type_h, type_c = out_type\n", + " batch = tf.shape(type_h)[0]\n", + " max_len = tf.shape(type_h)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.cast(tf.transpose(heads), tf.int32)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " out_type = out_type[:, :, leading_symbolic:]\n", + " types = tf.argmax(out_type, axis = 2)\n", + " return heads, types\n", + " \n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " learning_rate,\n", + " hidden_size_word,\n", + " cov = 0.0):\n", + " \n", + " self.words = tf.placeholder(tf.int32, (None, None))\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.heads = tf.placeholder(tf.int32, (None, None))\n", + " self.types = tf.placeholder(tf.int32, (None, None))\n", + " self.mask = tf.cast(tf.math.not_equal(self.words, 0), tf.float32)\n", + " self.maxlen = tf.shape(self.words)[1]\n", + " self.lengths = tf.count_nonzero(self.words, 1)\n", + " mask = self.mask\n", + " heads = self.heads\n", + " types = self.types\n", + " \n", + " self.arc_h = tf.layers.Dense(hidden_size_word)\n", + " self.arc_c = tf.layers.Dense(hidden_size_word)\n", + " self.attention = BiAAttention(hidden_size_word, hidden_size_word, 1)\n", + "\n", + " self.type_h = tf.layers.Dense(hidden_size_word)\n", + " self.type_c = tf.layers.Dense(hidden_size_word)\n", + " self.bilinear = BiLinear(hidden_size_word, hidden_size_word, len(tag2idx))\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=tf.transpose(self.words, [1, 0]),\n", + " seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n", + " input_mask=tf.transpose(self.input_masks, [1, 0]))\n", + " output_layer = xlnet_model.get_sequence_output()\n", + " output_layer = tf.transpose(output_layer, [1, 0, 2])\n", + " \n", + " arc_h = tf.nn.elu(self.arc_h(output_layer))\n", + " arc_c = tf.nn.elu(self.arc_c(output_layer))\n", + " \n", + " type_h = tf.nn.elu(self.type_h(output_layer))\n", + " type_c = tf.nn.elu(self.type_c(output_layer))\n", + " \n", + " out_arc = tf.squeeze(self.attention.forward(arc_h, arc_h, mask_d=self.mask, \n", + " mask_e=self.mask), axis = 1)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " sec_max_len = tf.shape(out_arc)[2]\n", + " batch_index = tf.range(0, batch)\n", + " \n", + " decode_arc = out_arc + tf.linalg.diag(tf.fill([max_len], -np.inf))\n", + " minus_mask = tf.expand_dims(tf.cast(1 - mask, tf.bool), axis = 2)\n", + " minus_mask = tf.tile(minus_mask, [1, 1, sec_max_len])\n", + " decode_arc = tf.where(minus_mask, tf.fill(tf.shape(decode_arc), -np.inf), decode_arc)\n", + " self.heads_seq = tf.argmax(decode_arc, axis = 1)\n", + " \n", + " t = tf.cast(tf.transpose(self.heads_seq), tf.int32)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " self.tags_seq = tf.argmax(out_type, axis = 2)\n", + " \n", + " batch = tf.shape(out_arc)[0]\n", + " max_len = tf.shape(out_arc)[1]\n", + " batch_index = tf.range(0, batch)\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0), \n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " type_h = tf.gather_nd(type_h, concatenated)\n", + " out_type = self.bilinear.forward(type_h, type_c)\n", + " minus_inf = -1e8\n", + " minus_mask = (1 - mask) * minus_inf\n", + " out_arc = out_arc + tf.expand_dims(minus_mask, axis = 2) + tf.expand_dims(minus_mask, axis = 1)\n", + " loss_arc = tf.nn.log_softmax(out_arc, dim=1)\n", + " loss_type = tf.nn.log_softmax(out_type, dim=2)\n", + " loss_arc = loss_arc * tf.expand_dims(mask, axis = 2) * tf.expand_dims(mask, axis = 1)\n", + " loss_type = loss_type * tf.expand_dims(mask, axis = 2)\n", + " num = tf.reduce_sum(mask) - tf.cast(batch, tf.float32)\n", + " child_index = tf.tile(tf.expand_dims(tf.range(0, max_len), 1), [1, batch])\n", + " t = tf.transpose(heads)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(t, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0)], axis = 0))\n", + " loss_arc = tf.gather_nd(loss_arc, concatenated)\n", + " loss_arc = tf.transpose(loss_arc, [1, 0])\n", + " \n", + " t = tf.transpose(types)\n", + " broadcasted = tf.broadcast_to(batch_index, tf.shape(t))\n", + " concatenated = tf.transpose(tf.concat([tf.expand_dims(broadcasted, axis = 0),\n", + " tf.expand_dims(child_index, axis = 0),\n", + " tf.expand_dims(t, axis = 0)], axis = 0))\n", + " loss_type = tf.gather_nd(loss_type, concatenated)\n", + " loss_type = tf.transpose(loss_type, [1, 0])\n", + " self.cost = (tf.reduce_sum(-loss_arc) / num) + (tf.reduce_sum(-loss_type) / num)\n", + " self.optimizer = tf.train.AdamOptimizer(\n", + " learning_rate = learning_rate\n", + " ).minimize(self.cost)\n", + " \n", + " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", + " \n", + " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", + " mask_label = tf.boolean_mask(self.types, mask)\n", + " correct_pred = tf.equal(tf.cast(self.prediction, tf.int32), mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " self.prediction = tf.cast(tf.boolean_mask(self.heads_seq, mask), tf.int32)\n", + " mask_label = tf.boolean_mask(self.heads, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy_depends = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:686: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:693: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:797: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:99: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :219: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :242: calling log_softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "dim is deprecated, use axis instead\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "learning_rate = 2e-5\n", + "hidden_size_word = 128\n", + "\n", + "model = Model(learning_rate, hidden_size_word)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_X[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_y = train_Y[:5]\n", + "batch_y = pad_sequences(batch_y,padding='post')\n", + "batch_depends = train_depends[:5]\n", + "batch_depends = pad_sequences(batch_depends,padding='post')\n", + "batch_segments = segments_train[:5]\n", + "batch_segments = pad_sequences(batch_segments, padding='post', value = 4)\n", + "batch_masks = masks_train[:5]\n", + "batch_masks = pad_sequences(batch_masks, padding='post', value = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.0, 0.0397351, 242.8986]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.accuracy_depends, model.cost],\n", + " feed_dict = {model.words: batch_x,\n", + " model.types: batch_y,\n", + " model.heads: batch_depends,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks})" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([30, 30, 30, 26, 30, 30, 26, 30, 30, 30, 30, 26, 30, 30, 30, 30, 30,\n", + " 43, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 26, 30, 30, 30, 30, 26,\n", + " 30, 20, 30, 30, 26, 43, 43, 30, 30, 30, 30, 30]),\n", + " array([16, 16, 16, 16, 17, 16, 16, 22, 17, 16, 16, 16, 16, 37, 16, 16, 32,\n", + " 16, 16, 16, 16, 11, 9, 13, 16, 22, 40, 17, 16, 16, 16, 16, 16, 16,\n", + " 16, 23, 22, 16, 16, 16, 16, 0, 0, 0, 0, 0]),\n", + " array([ 1, 2, 0, 2, 0, 2, 0, 7, 8, 2, 8, 0, 9, 9, 9, 0, 9,\n", + " 0, 9, 0, 16, 9, 19, 19, 8, 22, 22, 19, 24, 22, 0, 22, 0, 29,\n", + " 29, 29, 22, 2, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32))" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tags_seq, heads = sess.run(\n", + " [model.tags_seq, model.heads_seq],\n", + " feed_dict = {\n", + " model.words: batch_x,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks\n", + " },\n", + ")\n", + "tags_seq[0], heads[0], batch_depends[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 375/375 [01:38<00:00, 3.81it/s, accuracy=0.142, accuracy_depends=0.0446, cost=7.04]\n", + "test minibatch loop: 100%|██████████| 120/120 [00:09<00:00, 12.27it/s, accuracy=0.132, accuracy_depends=0.0337, cost=6.79]\n", + "train minibatch loop: 0%| | 0/375 [00:00\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]\n", + "\n", + "def XY(left_train, right_train):\n", + " X, Y, segments, masks = [], [], [], []\n", + " for i in tqdm(range(len(left_train))):\n", + " left = left_train[i]\n", + " right = right_train[i]\n", + " bert_tokens = []\n", + " y = []\n", + " for no, orig_token in enumerate(left):\n", + " y.append(right[no])\n", + " t = tokenize_fn(orig_token)\n", + " bert_tokens.extend(t)\n", + " y.extend(['X'] * (len(t) - 1))\n", + " bert_tokens.extend([4, 3])\n", + " segment = [0] * (len(bert_tokens) - 1) + [SEG_ID_CLS]\n", + " input_mask = [0] * len(segment)\n", + " y.extend(['PAD', 'PAD'])\n", + " X.append(bert_tokens)\n", + " Y.append([tag2idx[i] for i in y])\n", + " segments.append(segment)\n", + " masks.append(input_mask)\n", + " return X, Y, segments, masks" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 203571/203571 [02:48<00:00, 1209.93it/s]\n" + ] + } + ], + "source": [ + "train_X, train_Y, train_segments, train_masks = XY(left_train, right_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 51312/51312 [00:41<00:00, 1224.74it/s]\n" + ] + } + ], + "source": [ + "test_X, test_Y, test_segments, test_masks = XY(left_test, right_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = pad_sequences(train_X, padding='post')\n", + "train_Y = pad_sequences(train_Y, padding='post')\n", + "train_segments = pad_sequences(train_segments, padding='post', value = 4)\n", + "train_masks = pad_sequences(train_masks, padding='post', value = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = pad_sequences(test_X, padding='post')\n", + "test_Y = pad_sequences(test_Y, padding='post')\n", + "test_segments = pad_sequences(test_segments, padding='post', value = 4)\n", + "test_masks = pad_sequences(test_masks, padding='post', value = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/xlnet/xlnet.py:63: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.1,\n", + " dropatt=0.1,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "19084 1908\n" + ] + } + ], + "source": [ + "epoch = 3\n", + "batch_size = 32\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(train_X) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)\n", + "print(num_train_steps, num_warmup_steps)\n", + "\n", + "training_parameters = dict(\n", + " decay_method = 'poly',\n", + " train_steps = num_train_steps,\n", + " learning_rate = 2e-5,\n", + " warmup_steps = num_warmup_steps,\n", + " min_lr_ratio = 0.0,\n", + " weight_decay = 0.00,\n", + " adam_epsilon = 1e-8,\n", + " num_core_per_host = 1,\n", + " lr_layer_decay_rate = 1,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.02,\n", + " clip = 1.0,\n", + " clamp_len=-1,)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "class Parameter:\n", + " def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n", + " num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n", + " min_lr_ratio, clip, **kwargs):\n", + " self.decay_method = decay_method\n", + " self.warmup_steps = warmup_steps\n", + " self.weight_decay = weight_decay\n", + " self.adam_epsilon = adam_epsilon\n", + " self.num_core_per_host = num_core_per_host\n", + " self.lr_layer_decay_rate = lr_layer_decay_rate\n", + " self.use_tpu = use_tpu\n", + " self.learning_rate = learning_rate\n", + " self.train_steps = train_steps\n", + " self.min_lr_ratio = min_lr_ratio\n", + " self.clip = clip\n", + " \n", + "training_parameters = Parameter(**training_parameters)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.lengths = tf.count_nonzero(self.X, 1)\n", + " self.maxlen = tf.shape(self.X)[1]\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=tf.transpose(self.X, [1, 0]),\n", + " seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n", + " input_mask=tf.transpose(self.input_masks, [1, 0]))\n", + " output_layer = xlnet_model.get_sequence_output()\n", + " output_layer = tf.transpose(output_layer, [1, 0, 2])\n", + " \n", + " logits = tf.layers.dense(output_layer, dimension_output)\n", + " y_t = self.Y\n", + " log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(\n", + " logits, y_t, self.lengths\n", + " )\n", + " self.cost = tf.reduce_mean(-log_likelihood)\n", + " self.optimizer = tf.train.AdamOptimizer(\n", + " learning_rate = learning_rate\n", + " ).minimize(self.cost)\n", + " mask = tf.sequence_mask(self.lengths, maxlen = self.maxlen)\n", + " self.tags_seq, tags_score = tf.contrib.crf.crf_decode(\n", + " logits, transition_params, self.lengths\n", + " )\n", + " self.tags_seq = tf.identity(self.tags_seq, name = 'logits')\n", + "\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(self.tags_seq, mask)\n", + " mask_label = tf.boolean_mask(y_t, mask)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/xlnet/xlnet.py:220: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/xlnet/xlnet.py:220: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/xlnet/modeling.py:453: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/xlnet/modeling.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/xlnet/modeling.py:535: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:99: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/crf/python/ops/crf.py:213: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n" + ] + } + ], + "source": [ + "dimension_output = len(tag2idx)\n", + "learning_rate = 2e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + " learning_rate\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 6362/6362 [1:26:47<00:00, 1.22it/s, accuracy=0.998, cost=0.563] \n", + "test minibatch loop: 100%|██████████| 1604/1604 [08:00<00:00, 3.34it/s, accuracy=0.98, cost=3.63] \n", + "train minibatch loop: 0%| | 0/6362 [00:00:6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :15: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :16: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :18: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "UNK = 3\n", + "\n", + "def str_idx(corpus, dic):\n", + " X = []\n", + " for i in corpus:\n", + " ints = []\n", + " for k in i.split():\n", + " ints.append(dic.get(k,UNK))\n", + " X.append(ints)\n", + " return X\n", + "\n", + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = str_idx(dataset['train_texts'], dictionary)\n", + "test_X = str_idx(dataset['test_texts'], dictionary)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "train_clss = dataset['train_clss']\n", + "test_clss = dataset['test_clss']\n", + "train_Y = dataset['train_labels']\n", + "test_Y = dataset['test_labels']" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.27272728, 0.68941796)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x, _ = pad_sentence_batch(train_X[:5], 0)\n", + "batch_y, _ = pad_sentence_batch(train_Y[:5], 0)\n", + "batch_clss, _ = pad_sentence_batch(train_clss[:5], -1)\n", + "batch_clss = np.array(batch_clss)\n", + "batch_mask = 1 - (batch_clss == -1)\n", + "batch_clss[batch_clss == -1] = 0\n", + "\n", + "feed = {model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.mask: batch_mask,\n", + " model.clss: batch_clss}\n", + "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n", + "acc, loss" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 578/578 [23:49<00:00, 2.47s/it, accuracy=0, cost=0.267] \n", + "minibatch loop: 100%|██████████| 145/145 [02:01<00:00, 1.19it/s, accuracy=0, cost=0.221]\n", + "minibatch loop: 0%| | 0/578 [00:00:4: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From :25: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :54: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "UNK = 3\n", + "\n", + "def str_idx(corpus, dic):\n", + " X = []\n", + " for i in corpus:\n", + " ints = []\n", + " for k in i.split():\n", + " ints.append(dic.get(k,UNK))\n", + " X.append(ints)\n", + " return X\n", + "\n", + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = str_idx(dataset['train_texts'], dictionary)\n", + "test_X = str_idx(dataset['test_texts'], dictionary)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "train_clss = dataset['train_clss']\n", + "test_clss = dataset['test_clss']\n", + "train_Y = dataset['train_labels']\n", + "test_Y = dataset['test_labels']" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(0.36363637, 0.80718136)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x, _ = pad_sentence_batch(train_X[:5], 0)\n", + "batch_y, _ = pad_sentence_batch(train_Y[:5], 0)\n", + "batch_clss, _ = pad_sentence_batch(train_clss[:5], -1)\n", + "batch_clss = np.array(batch_clss)\n", + "batch_mask = 1 - (batch_clss == -1)\n", + "batch_clss[batch_clss == -1] = 0\n", + "\n", + "feed = {model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.mask: batch_mask,\n", + " model.clss: batch_clss}\n", + "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n", + "acc, loss" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 578/578 [05:11<00:00, 1.86it/s, accuracy=0, cost=0.268] \n", + "minibatch loop: 100%|██████████| 145/145 [00:21<00:00, 6.74it/s, accuracy=0, cost=0.221] \n", + "minibatch loop: 0%| | 0/578 [00:00:98: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From :68: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :41: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :30: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,embedded_size,len(dictionary),learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "UNK = 3\n", + "\n", + "def str_idx(corpus, dic):\n", + " X = []\n", + " for i in corpus:\n", + " ints = []\n", + " for k in i.split():\n", + " ints.append(dic.get(k,UNK))\n", + " X.append(ints)\n", + " return X\n", + "\n", + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = str_idx(dataset['train_texts'], dictionary)\n", + "test_X = str_idx(dataset['test_texts'], dictionary)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "train_clss = dataset['train_clss']\n", + "test_clss = dataset['test_clss']\n", + "train_Y = dataset['train_labels']\n", + "test_Y = dataset['test_labels']" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(1.0, 1.4390177)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x, _ = pad_sentence_batch(train_X[:64], 0)\n", + "batch_y, _ = pad_sentence_batch(train_Y[:64], 0)\n", + "batch_clss, _ = pad_sentence_batch(train_clss[:64], -1)\n", + "batch_clss = np.array(batch_clss)\n", + "batch_mask = 1 - (batch_clss == -1)\n", + "batch_clss[batch_clss == -1] = 0\n", + "\n", + "feed = {model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.mask: batch_mask,\n", + " model.clss: batch_clss}\n", + "acc, loss, _ = sess.run([model.accuracy, model.cost,model.optimizer], feed_dict = feed)\n", + "acc, loss" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1156/1156 [06:35<00:00, 2.92it/s, accuracy=0.0167, cost=0.376] \n", + "minibatch loop: 100%|██████████| 289/289 [00:37<00:00, 7.65it/s, accuracy=0.00641, cost=0.397]\n", + "minibatch loop: 0%| | 0/1156 [00:00 `tf.nn.relu`. + + Args: + activation_string: String name of the activation function. + + Returns: + A Python function corresponding to the activation function. If + `activation_string` is None, empty, or "linear", this will return None. + If `activation_string` is not a string, it will return `activation_string`. + + Raises: + ValueError: The `activation_string` does not correspond to a known + activation. + """ + + # We assume that anything that"s not a string is already an activation + # function, so we just return it. + if not isinstance(activation_string, six.string_types): + return activation_string + + if not activation_string: + return None + + act = activation_string.lower() + if act == "linear": + return None + elif act == "relu": + return tf.nn.relu + elif act == "gelu": + return gelu + elif act == "tanh": + return tf.tanh + else: + raise ValueError("Unsupported activation: %s" % act) + + +def get_assignment_map_from_checkpoint(tvars, init_checkpoint): + """Compute the union of the current variables and checkpoint variables.""" + assignment_map = {} + initialized_variable_names = {} + + name_to_variable = collections.OrderedDict() + for var in tvars: + name = var.name + m = re.match("^(.*):\\d+$", name) + if m is not None: + name = m.group(1) + name_to_variable[name] = var + + init_vars = tf.train.list_variables(init_checkpoint) + + assignment_map = collections.OrderedDict() + for x in init_vars: + (name, var) = (x[0], x[1]) + if name not in name_to_variable: + continue + assignment_map[name] = name + initialized_variable_names[name] = 1 + initialized_variable_names[name + ":0"] = 1 + + return (assignment_map, initialized_variable_names) + + +def dropout(input_tensor, dropout_prob): + """Perform dropout. + + Args: + input_tensor: float Tensor. + dropout_prob: Python float. The probability of dropping out a value (NOT of + *keeping* a dimension as in `tf.nn.dropout`). + + Returns: + A version of `input_tensor` with dropout applied. + """ + if dropout_prob is None or dropout_prob == 0.0: + return input_tensor + + output = tf.nn.dropout(input_tensor, 1.0 - dropout_prob) + return output + + +def layer_norm(input_tensor, name=None): + """Run layer normalization on the last dimension of the tensor.""" + return tf.contrib.layers.layer_norm( + inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name) + + +def layer_norm_and_dropout(input_tensor, dropout_prob, name=None): + """Runs layer normalization followed by dropout.""" + output_tensor = layer_norm(input_tensor, name) + output_tensor = dropout(output_tensor, dropout_prob) + return output_tensor + + +def create_initializer(initializer_range=0.02): + """Creates a `truncated_normal_initializer` with the given range.""" + return tf.truncated_normal_initializer(stddev=initializer_range) + + +def embedding_lookup(input_ids, + vocab_size, + embedding_size=128, + initializer_range=0.02, + word_embedding_name="word_embeddings", + use_one_hot_embeddings=False): + """Looks up words embeddings for id tensor. + + Args: + input_ids: int32 Tensor of shape [batch_size, seq_length] containing word + ids. + vocab_size: int. Size of the embedding vocabulary. + embedding_size: int. Width of the word embeddings. + initializer_range: float. Embedding initialization range. + word_embedding_name: string. Name of the embedding table. + use_one_hot_embeddings: bool. If True, use one-hot method for word + embeddings. If False, use `tf.gather()`. + + Returns: + float Tensor of shape [batch_size, seq_length, embedding_size]. + """ + # This function assumes that the input is of shape [batch_size, seq_length, + # num_inputs]. + # + # If the input is a 2D tensor of shape [batch_size, seq_length], we + # reshape to [batch_size, seq_length, 1]. + if input_ids.shape.ndims == 2: + input_ids = tf.expand_dims(input_ids, axis=[-1]) + + embedding_table = tf.get_variable( + name=word_embedding_name, + shape=[vocab_size, embedding_size], + initializer=create_initializer(initializer_range)) + + flat_input_ids = tf.reshape(input_ids, [-1]) + if use_one_hot_embeddings: + one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size) + output = tf.matmul(one_hot_input_ids, embedding_table) + else: + output = tf.gather(embedding_table, flat_input_ids) + + input_shape = get_shape_list(input_ids) + + output = tf.reshape(output, + input_shape[0:-1] + [input_shape[-1] * embedding_size]) + return (output, embedding_table) + + +def embedding_postprocessor(input_tensor, + use_token_type=False, + token_type_ids=None, + token_type_vocab_size=16, + token_type_embedding_name="token_type_embeddings", + use_position_embeddings=True, + position_embedding_name="position_embeddings", + initializer_range=0.02, + max_position_embeddings=512, + dropout_prob=0.1): + """Performs various post-processing on a word embedding tensor. + + Args: + input_tensor: float Tensor of shape [batch_size, seq_length, + embedding_size]. + use_token_type: bool. Whether to add embeddings for `token_type_ids`. + token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length]. + Must be specified if `use_token_type` is True. + token_type_vocab_size: int. The vocabulary size of `token_type_ids`. + token_type_embedding_name: string. The name of the embedding table variable + for token type ids. + use_position_embeddings: bool. Whether to add position embeddings for the + position of each token in the sequence. + position_embedding_name: string. The name of the embedding table variable + for positional embeddings. + initializer_range: float. Range of the weight initialization. + max_position_embeddings: int. Maximum sequence length that might ever be + used with this model. This can be longer than the sequence length of + input_tensor, but cannot be shorter. + dropout_prob: float. Dropout probability applied to the final output tensor. + + Returns: + float tensor with same shape as `input_tensor`. + + Raises: + ValueError: One of the tensor shapes or input values is invalid. + """ + input_shape = get_shape_list(input_tensor, expected_rank=3) + batch_size = input_shape[0] + seq_length = input_shape[1] + width = input_shape[2] + + output = input_tensor + + if use_token_type: + if token_type_ids is None: + raise ValueError("`token_type_ids` must be specified if" + "`use_token_type` is True.") + token_type_table = tf.get_variable( + name=token_type_embedding_name, + shape=[token_type_vocab_size, width], + initializer=create_initializer(initializer_range)) + # This vocab will be small so we always do one-hot here, since it is always + # faster for a small vocabulary. + flat_token_type_ids = tf.reshape(token_type_ids, [-1]) + one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size) + token_type_embeddings = tf.matmul(one_hot_ids, token_type_table) + token_type_embeddings = tf.reshape(token_type_embeddings, + [batch_size, seq_length, width]) + output += token_type_embeddings + + if use_position_embeddings: + assert_op = tf.assert_less_equal(seq_length, max_position_embeddings) + with tf.control_dependencies([assert_op]): + full_position_embeddings = tf.get_variable( + name=position_embedding_name, + shape=[max_position_embeddings, width], + initializer=create_initializer(initializer_range)) + # Since the position embedding table is a learned variable, we create it + # using a (long) sequence length `max_position_embeddings`. The actual + # sequence length might be shorter than this, for faster training of + # tasks that do not have long sequences. + # + # So `full_position_embeddings` is effectively an embedding table + # for position [0, 1, 2, ..., max_position_embeddings-1], and the current + # sequence has positions [0, 1, 2, ... seq_length-1], so we can just + # perform a slice. + position_embeddings = tf.slice(full_position_embeddings, [0, 0], + [seq_length, -1]) + num_dims = len(output.shape.as_list()) + + # Only the last two dimensions are relevant (`seq_length` and `width`), so + # we broadcast among the first dimensions, which is typically just + # the batch size. + position_broadcast_shape = [] + for _ in range(num_dims - 2): + position_broadcast_shape.append(1) + position_broadcast_shape.extend([seq_length, width]) + position_embeddings = tf.reshape(position_embeddings, + position_broadcast_shape) + output += position_embeddings + + output = layer_norm_and_dropout(output, dropout_prob) + return output + + +def create_attention_mask_from_input_mask(from_tensor, to_mask): + """Create 3D attention mask from a 2D tensor mask. + + Args: + from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...]. + to_mask: int32 Tensor of shape [batch_size, to_seq_length]. + + Returns: + float Tensor of shape [batch_size, from_seq_length, to_seq_length]. + """ + from_shape = get_shape_list(from_tensor, expected_rank=[2, 3]) + batch_size = from_shape[0] + from_seq_length = from_shape[1] + + to_shape = get_shape_list(to_mask, expected_rank=2) + to_seq_length = to_shape[1] + + to_mask = tf.cast( + tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32) + + # We don't assume that `from_tensor` is a mask (although it could be). We + # don't actually care if we attend *from* padding tokens (only *to* padding) + # tokens so we create a tensor of all ones. + # + # `broadcast_ones` = [batch_size, from_seq_length, 1] + broadcast_ones = tf.ones( + shape=[batch_size, from_seq_length, 1], dtype=tf.float32) + + # Here we broadcast along two dimensions to create the mask. + mask = broadcast_ones * to_mask + + return mask + + +def attention_layer(from_tensor, + to_tensor, + attention_mask=None, + num_attention_heads=1, + size_per_head=512, + query_act=None, + key_act=None, + value_act=None, + attention_probs_dropout_prob=0.0, + initializer_range=0.02, + do_return_2d_tensor=False, + batch_size=None, + from_seq_length=None, + to_seq_length=None): + """Performs multi-headed attention from `from_tensor` to `to_tensor`. + + This is an implementation of multi-headed attention based on "Attention + is all you Need". If `from_tensor` and `to_tensor` are the same, then + this is self-attention. Each timestep in `from_tensor` attends to the + corresponding sequence in `to_tensor`, and returns a fixed-with vector. + + This function first projects `from_tensor` into a "query" tensor and + `to_tensor` into "key" and "value" tensors. These are (effectively) a list + of tensors of length `num_attention_heads`, where each tensor is of shape + [batch_size, seq_length, size_per_head]. + + Then, the query and key tensors are dot-producted and scaled. These are + softmaxed to obtain attention probabilities. The value tensors are then + interpolated by these probabilities, then concatenated back to a single + tensor and returned. + + In practice, the multi-headed attention are done with transposes and + reshapes rather than actual separate tensors. + + Args: + from_tensor: float Tensor of shape [batch_size, from_seq_length, + from_width]. + to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width]. + attention_mask: (optional) int32 Tensor of shape [batch_size, + from_seq_length, to_seq_length]. The values should be 1 or 0. The + attention scores will effectively be set to -infinity for any positions in + the mask that are 0, and will be unchanged for positions that are 1. + num_attention_heads: int. Number of attention heads. + size_per_head: int. Size of each attention head. + query_act: (optional) Activation function for the query transform. + key_act: (optional) Activation function for the key transform. + value_act: (optional) Activation function for the value transform. + attention_probs_dropout_prob: (optional) float. Dropout probability of the + attention probabilities. + initializer_range: float. Range of the weight initializer. + do_return_2d_tensor: bool. If True, the output will be of shape [batch_size + * from_seq_length, num_attention_heads * size_per_head]. If False, the + output will be of shape [batch_size, from_seq_length, num_attention_heads + * size_per_head]. + batch_size: (Optional) int. If the input is 2D, this might be the batch size + of the 3D version of the `from_tensor` and `to_tensor`. + from_seq_length: (Optional) If the input is 2D, this might be the seq length + of the 3D version of the `from_tensor`. + to_seq_length: (Optional) If the input is 2D, this might be the seq length + of the 3D version of the `to_tensor`. + + Returns: + float Tensor of shape [batch_size, from_seq_length, + num_attention_heads * size_per_head]. (If `do_return_2d_tensor` is + true, this will be of shape [batch_size * from_seq_length, + num_attention_heads * size_per_head]). + + Raises: + ValueError: Any of the arguments or tensor shapes are invalid. + """ + + def transpose_for_scores(input_tensor, batch_size, num_attention_heads, + seq_length, width): + output_tensor = tf.reshape( + input_tensor, [batch_size, seq_length, num_attention_heads, width]) + + output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3]) + return output_tensor + + from_shape = get_shape_list(from_tensor, expected_rank=[2, 3]) + to_shape = get_shape_list(to_tensor, expected_rank=[2, 3]) + + if len(from_shape) != len(to_shape): + raise ValueError( + "The rank of `from_tensor` must match the rank of `to_tensor`.") + + if len(from_shape) == 3: + batch_size = from_shape[0] + from_seq_length = from_shape[1] + to_seq_length = to_shape[1] + elif len(from_shape) == 2: + if (batch_size is None or from_seq_length is None or to_seq_length is None): + raise ValueError( + "When passing in rank 2 tensors to attention_layer, the values " + "for `batch_size`, `from_seq_length`, and `to_seq_length` " + "must all be specified.") + + # Scalar dimensions referenced here: + # B = batch size (number of sequences) + # F = `from_tensor` sequence length + # T = `to_tensor` sequence length + # N = `num_attention_heads` + # H = `size_per_head` + + from_tensor_2d = reshape_to_matrix(from_tensor) + to_tensor_2d = reshape_to_matrix(to_tensor) + + # `query_layer` = [B*F, N*H] + query_layer = tf.layers.dense( + from_tensor_2d, + num_attention_heads * size_per_head, + activation=query_act, + name="query", + kernel_initializer=create_initializer(initializer_range)) + + # `key_layer` = [B*T, N*H] + key_layer = tf.layers.dense( + to_tensor_2d, + num_attention_heads * size_per_head, + activation=key_act, + name="key", + kernel_initializer=create_initializer(initializer_range)) + + # `value_layer` = [B*T, N*H] + value_layer = tf.layers.dense( + to_tensor_2d, + num_attention_heads * size_per_head, + activation=value_act, + name="value", + kernel_initializer=create_initializer(initializer_range)) + + # `query_layer` = [B, N, F, H] + query_layer = transpose_for_scores(query_layer, batch_size, + num_attention_heads, from_seq_length, + size_per_head) + + # `key_layer` = [B, N, T, H] + key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads, + to_seq_length, size_per_head) + + # Take the dot product between "query" and "key" to get the raw + # attention scores. + # `attention_scores` = [B, N, F, T] + attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True) + attention_scores = tf.multiply(attention_scores, + 1.0 / math.sqrt(float(size_per_head))) + + if attention_mask is not None: + # `attention_mask` = [B, 1, F, T] + attention_mask = tf.expand_dims(attention_mask, axis=[1]) + + # Since attention_mask is 1.0 for positions we want to attend and 0.0 for + # masked positions, this operation will create a tensor which is 0.0 for + # positions we want to attend and -10000.0 for masked positions. + adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0 + + # Since we are adding it to the raw scores before the softmax, this is + # effectively the same as removing these entirely. + attention_scores += adder + + # Normalize the attention scores to probabilities. + # `attention_probs` = [B, N, F, T] + attention_probs = tf.nn.softmax(attention_scores) + + # This is actually dropping out entire tokens to attend to, which might + # seem a bit unusual, but is taken from the original Transformer paper. + attention_probs = dropout(attention_probs, attention_probs_dropout_prob) + + # `value_layer` = [B, T, N, H] + value_layer = tf.reshape( + value_layer, + [batch_size, to_seq_length, num_attention_heads, size_per_head]) + + # `value_layer` = [B, N, T, H] + value_layer = tf.transpose(value_layer, [0, 2, 1, 3]) + + # `context_layer` = [B, N, F, H] + context_layer = tf.matmul(attention_probs, value_layer) + + # `context_layer` = [B, F, N, H] + context_layer = tf.transpose(context_layer, [0, 2, 1, 3]) + + if do_return_2d_tensor: + # `context_layer` = [B*F, N*H] + context_layer = tf.reshape( + context_layer, + [batch_size * from_seq_length, num_attention_heads * size_per_head]) + else: + # `context_layer` = [B, F, N*H] + context_layer = tf.reshape( + context_layer, + [batch_size, from_seq_length, num_attention_heads * size_per_head]) + + return context_layer + + +def transformer_model(input_tensor, + attention_mask=None, + hidden_size=768, + num_hidden_layers=12, + num_attention_heads=12, + intermediate_size=3072, + intermediate_act_fn=gelu, + hidden_dropout_prob=0.1, + attention_probs_dropout_prob=0.1, + initializer_range=0.02, + do_return_all_layers=False): + """Multi-headed, multi-layer Transformer from "Attention is All You Need". + + This is almost an exact implementation of the original Transformer encoder. + + See the original paper: + https://arxiv.org/abs/1706.03762 + + Also see: + https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py + + Args: + input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size]. + attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length, + seq_length], with 1 for positions that can be attended to and 0 in + positions that should not be. + hidden_size: int. Hidden size of the Transformer. + num_hidden_layers: int. Number of layers (blocks) in the Transformer. + num_attention_heads: int. Number of attention heads in the Transformer. + intermediate_size: int. The size of the "intermediate" (a.k.a., feed + forward) layer. + intermediate_act_fn: function. The non-linear activation function to apply + to the output of the intermediate/feed-forward layer. + hidden_dropout_prob: float. Dropout probability for the hidden layers. + attention_probs_dropout_prob: float. Dropout probability of the attention + probabilities. + initializer_range: float. Range of the initializer (stddev of truncated + normal). + do_return_all_layers: Whether to also return all layers or just the final + layer. + + Returns: + float Tensor of shape [batch_size, seq_length, hidden_size], the final + hidden layer of the Transformer. + + Raises: + ValueError: A Tensor shape or parameter is invalid. + """ + if hidden_size % num_attention_heads != 0: + raise ValueError( + "The hidden size (%d) is not a multiple of the number of attention " + "heads (%d)" % (hidden_size, num_attention_heads)) + + attention_head_size = int(hidden_size / num_attention_heads) + input_shape = get_shape_list(input_tensor, expected_rank=3) + batch_size = input_shape[0] + seq_length = input_shape[1] + input_width = input_shape[2] + + # The Transformer performs sum residuals on all layers so the input needs + # to be the same as the hidden size. + if input_width != hidden_size: + raise ValueError("The width of the input tensor (%d) != hidden size (%d)" % + (input_width, hidden_size)) + + # We keep the representation as a 2D tensor to avoid re-shaping it back and + # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on + # the GPU/CPU but may not be free on the TPU, so we want to minimize them to + # help the optimizer. + prev_output = reshape_to_matrix(input_tensor) + + all_layer_outputs = [] + for layer_idx in range(num_hidden_layers): + with tf.variable_scope("layer_%d" % layer_idx): + layer_input = prev_output + + with tf.variable_scope("attention"): + attention_heads = [] + with tf.variable_scope("self"): + attention_head = attention_layer( + from_tensor=layer_input, + to_tensor=layer_input, + attention_mask=attention_mask, + num_attention_heads=num_attention_heads, + size_per_head=attention_head_size, + attention_probs_dropout_prob=attention_probs_dropout_prob, + initializer_range=initializer_range, + do_return_2d_tensor=True, + batch_size=batch_size, + from_seq_length=seq_length, + to_seq_length=seq_length) + attention_heads.append(attention_head) + + attention_output = None + if len(attention_heads) == 1: + attention_output = attention_heads[0] + else: + # In the case where we have other sequences, we just concatenate + # them to the self-attention head before the projection. + attention_output = tf.concat(attention_heads, axis=-1) + + # Run a linear projection of `hidden_size` then add a residual + # with `layer_input`. + with tf.variable_scope("output"): + attention_output = tf.layers.dense( + attention_output, + hidden_size, + kernel_initializer=create_initializer(initializer_range)) + attention_output = dropout(attention_output, hidden_dropout_prob) + attention_output = layer_norm(attention_output + layer_input) + + # The activation is only applied to the "intermediate" hidden layer. + with tf.variable_scope("intermediate"): + intermediate_output = tf.layers.dense( + attention_output, + intermediate_size, + activation=intermediate_act_fn, + kernel_initializer=create_initializer(initializer_range)) + + # Down-project back to `hidden_size` then add the residual. + with tf.variable_scope("output"): + layer_output = tf.layers.dense( + intermediate_output, + hidden_size, + kernel_initializer=create_initializer(initializer_range)) + layer_output = dropout(layer_output, hidden_dropout_prob) + layer_output = layer_norm(layer_output + attention_output) + prev_output = layer_output + all_layer_outputs.append(layer_output) + + if do_return_all_layers: + final_outputs = [] + for layer_output in all_layer_outputs: + final_output = reshape_from_matrix(layer_output, input_shape) + final_outputs.append(final_output) + return final_outputs + else: + final_output = reshape_from_matrix(prev_output, input_shape) + return final_output + + +def get_shape_list(tensor, expected_rank=None, name=None): + """Returns a list of the shape of tensor, preferring static dimensions. + + Args: + tensor: A tf.Tensor object to find the shape of. + expected_rank: (optional) int. The expected rank of `tensor`. If this is + specified and the `tensor` has a different rank, and exception will be + thrown. + name: Optional name of the tensor for the error message. + + Returns: + A list of dimensions of the shape of tensor. All static dimensions will + be returned as python integers, and dynamic dimensions will be returned + as tf.Tensor scalars. + """ + if name is None: + name = tensor.name + + if expected_rank is not None: + assert_rank(tensor, expected_rank, name) + + shape = tensor.shape.as_list() + + non_static_indexes = [] + for (index, dim) in enumerate(shape): + if dim is None: + non_static_indexes.append(index) + + if not non_static_indexes: + return shape + + dyn_shape = tf.shape(tensor) + for index in non_static_indexes: + shape[index] = dyn_shape[index] + return shape + + +def reshape_to_matrix(input_tensor): + """Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).""" + ndims = input_tensor.shape.ndims + if ndims < 2: + raise ValueError("Input tensor must have at least rank 2. Shape = %s" % + (input_tensor.shape)) + if ndims == 2: + return input_tensor + + width = input_tensor.shape[-1] + output_tensor = tf.reshape(input_tensor, [-1, width]) + return output_tensor + + +def reshape_from_matrix(output_tensor, orig_shape_list): + """Reshapes a rank 2 tensor back to its original rank >= 2 tensor.""" + if len(orig_shape_list) == 2: + return output_tensor + + output_shape = get_shape_list(output_tensor) + + orig_dims = orig_shape_list[0:-1] + width = output_shape[-1] + + return tf.reshape(output_tensor, orig_dims + [width]) + + +def assert_rank(tensor, expected_rank, name=None): + """Raises an exception if the tensor rank is not of the expected rank. + + Args: + tensor: A tf.Tensor to check the rank of. + expected_rank: Python integer or list of integers, expected rank. + name: Optional name of the tensor for the error message. + + Raises: + ValueError: If the expected shape doesn't match the actual shape. + """ + if name is None: + name = tensor.name + + expected_rank_dict = {} + if isinstance(expected_rank, six.integer_types): + expected_rank_dict[expected_rank] = True + else: + for x in expected_rank: + expected_rank_dict[x] = True + + actual_rank = tensor.shape.ndims + if actual_rank not in expected_rank_dict: + scope_name = tf.get_variable_scope().name + raise ValueError( + "For the tensor `%s` in scope `%s`, the actual rank " + "`%d` (shape = %s) is not equal to the expected rank `%s`" % + (name, scope_name, actual_rank, str(tensor.shape), str(expected_rank))) diff --git a/extractive-summarization/preprocessing-data-bert.ipynb b/extractive-summarization/preprocessing-data-bert.ipynb new file mode 100644 index 0000000..9ab3b4f --- /dev/null +++ b/extractive-summarization/preprocessing-data-bert.ipynb @@ -0,0 +1,369 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# !pip3 install malaya\n", + "\n", + "import malaya\n", + "import re\n", + "from malaya.texts._text_functions import split_into_sentences\n", + "from malaya.texts import _regex\n", + "\n", + "splitter = split_into_sentences" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "from bert import tokenization\n", + "\n", + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "tokenizer = tokenization.FullTokenizer(vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "92579" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import glob\n", + "\n", + "stories = glob.glob('cnn/stories/*.story')\n", + "len(stories)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "def split_story(doc):\n", + " index = doc.find('@highlight')\n", + " story, highlights = doc[:index], doc[index:].split('@highlight')\n", + " highlights = [h.strip() for h in highlights if len(h) > 0]\n", + " stories = []\n", + " for s in splitter(story):\n", + " stories.append(s.split())\n", + " summaries = []\n", + " for s in highlights:\n", + " summaries.append(s.split())\n", + " return stories, summaries" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "min_src_nsents = 3\n", + "max_src_nsents = 20\n", + "min_src_ntokens_per_sent = 5\n", + "max_src_ntokens_per_sent = 30\n", + "min_tgt_ntokens = 5\n", + "max_tgt_ntokens = 500\n", + "sep_token = '[SEP]'\n", + "cls_token = '[CLS]'\n", + "pad_token = '[PAD]'\n", + "tgt_bos = '[unused0]'\n", + "tgt_eos = '[unused1]'\n", + "tgt_sent_split = '[unused2]'\n", + "sep_vid = tokenizer.vocab[sep_token]\n", + "cls_vid = tokenizer.vocab[cls_token]\n", + "pad_vid = tokenizer.vocab[pad_token]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "with open(stories[0]) as fopen:\n", + " story = fopen.read()\n", + "story, highlights = split_story(story)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], + "source": [ + "def _get_ngrams(n, text):\n", + " ngram_set = set()\n", + " text_length = len(text)\n", + " max_index_ngram_start = text_length - n\n", + " for i in range(max_index_ngram_start + 1):\n", + " ngram_set.add(tuple(text[i:i + n]))\n", + " return ngram_set\n", + "\n", + "\n", + "def _get_word_ngrams(n, sentences):\n", + " assert len(sentences) > 0\n", + " assert n > 0\n", + "\n", + " words = sum(sentences, [])\n", + " return _get_ngrams(n, words)\n", + "\n", + "def cal_rouge(evaluated_ngrams, reference_ngrams):\n", + " reference_count = len(reference_ngrams)\n", + " evaluated_count = len(evaluated_ngrams)\n", + "\n", + " overlapping_ngrams = evaluated_ngrams.intersection(reference_ngrams)\n", + " overlapping_count = len(overlapping_ngrams)\n", + "\n", + " if evaluated_count == 0:\n", + " precision = 0.0\n", + " else:\n", + " precision = overlapping_count / evaluated_count\n", + "\n", + " if reference_count == 0:\n", + " recall = 0.0\n", + " else:\n", + " recall = overlapping_count / reference_count\n", + "\n", + " f1_score = 2.0 * ((precision * recall) / (precision + recall + 1e-8))\n", + " return {\"f\": f1_score, \"p\": precision, \"r\": recall}\n", + "\n", + "\n", + "def greedy_selection(doc_sent_list, abstract_sent_list, summary_size):\n", + " def _rouge_clean(s):\n", + " return re.sub(r'[^a-zA-Z0-9 ]', '', s)\n", + "\n", + " max_rouge = 0.0\n", + " abstract = sum(abstract_sent_list, [])\n", + " abstract = _rouge_clean(' '.join(abstract)).split()\n", + " sents = [_rouge_clean(' '.join(s)).split() for s in doc_sent_list]\n", + " evaluated_1grams = [_get_word_ngrams(1, [sent]) for sent in sents]\n", + " reference_1grams = _get_word_ngrams(1, [abstract])\n", + " evaluated_2grams = [_get_word_ngrams(2, [sent]) for sent in sents]\n", + " reference_2grams = _get_word_ngrams(2, [abstract])\n", + "\n", + " selected = []\n", + " for s in range(summary_size):\n", + " cur_max_rouge = max_rouge\n", + " cur_id = -1\n", + " for i in range(len(sents)):\n", + " if (i in selected):\n", + " continue\n", + " c = selected + [i]\n", + " candidates_1 = [evaluated_1grams[idx] for idx in c]\n", + " candidates_1 = set.union(*map(set, candidates_1))\n", + " candidates_2 = [evaluated_2grams[idx] for idx in c]\n", + " candidates_2 = set.union(*map(set, candidates_2))\n", + " rouge_1 = cal_rouge(candidates_1, reference_1grams)['f']\n", + " rouge_2 = cal_rouge(candidates_2, reference_2grams)['f']\n", + " rouge_score = rouge_1 + rouge_2\n", + " if rouge_score > cur_max_rouge:\n", + " cur_max_rouge = rouge_score\n", + " cur_id = i\n", + " if (cur_id == -1):\n", + " return selected\n", + " selected.append(cur_id)\n", + " max_rouge = cur_max_rouge\n", + "\n", + " return sorted(selected)\n", + "\n", + "def get_xy(story, highlights):\n", + " idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n", + " \n", + " idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n", + "\n", + " src = [story[i][:max_src_ntokens_per_sent] for i in idxs]\n", + " src = src[:max_src_nsents]\n", + "\n", + " sent_labels = greedy_selection(src, highlights, 3)\n", + "\n", + " _sent_labels = [0] * len(src)\n", + " for l in sent_labels:\n", + " _sent_labels[l] = 1\n", + " _sent_labels\n", + " \n", + " src_txt = [' '.join(sent) for sent in src]\n", + " src_subtokens = []\n", + " for i, text in enumerate(src_txt):\n", + " text = tokenizer.tokenize(text)\n", + " if i > 0:\n", + " text = ['[SEP]','[CLS]'] + text\n", + " src_subtokens.extend(text)\n", + " \n", + " src_subtokens = [cls_token] + src_subtokens + [sep_token]\n", + " src_subtoken_idxs = tokenizer.convert_tokens_to_ids(src_subtokens)\n", + " \n", + " _segs = [-1] + [i for i, t in enumerate(src_subtoken_idxs) if t == sep_vid]\n", + " segs = [_segs[i] - _segs[i - 1] for i in range(1, len(_segs))]\n", + " segments_ids = []\n", + " for i, s in enumerate(segs):\n", + " if (i % 2 == 0):\n", + " segments_ids += s * [0]\n", + " else:\n", + " segments_ids += s * [1]\n", + " cls_ids = [i for i, t in enumerate(src_subtoken_idxs) if t == cls_vid]\n", + " \n", + " return src_subtoken_idxs, cls_ids, _sent_labels, segments_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [], + "source": [ + "with open(stories[1]) as fopen:\n", + " story = fopen.read()\n", + "story, highlights = split_story(story)\n", + "text, cls_ids, sent_labels, segments_ids = get_xy(story, highlights)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(20, 20, 661, 661)" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sent_labels), len(cls_ids), len(text), len(segments_ids)" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 92579/92579 [13:52<00:00, 111.15it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "texts, clss, labels, segments = [], [], [], []\n", + "\n", + "for i in tqdm(range(len(stories))):\n", + " with open(stories[i]) as fopen:\n", + " story = fopen.read()\n", + " story, highlights = split_story(story)\n", + " text, cls_ids, sent_labels, segments_ids = get_xy(story, highlights)\n", + " if len(cls_ids) != len(sent_labels):\n", + " continue\n", + " texts.append(text)\n", + " clss.append(cls_ids)\n", + " labels.append(sent_labels)\n", + " segments.append(segments_ids)" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_texts, test_texts, train_clss, test_clss, train_labels, test_labels, train_segments, test_segments = \\\n", + "train_test_split(texts, clss, labels, segments, test_size = 0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "with open('dataset-bert.pkl', 'wb') as fopen:\n", + " pickle.dump({'train_texts': train_texts,\n", + " 'test_texts': test_texts,\n", + " 'train_clss': train_clss,\n", + " 'test_clss': test_clss,\n", + " 'train_labels': train_labels,\n", + " 'test_labels': test_labels,\n", + " 'train_segments': train_segments,\n", + " 'test_segments': test_segments}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/extractive-summarization/preprocessing-data.ipynb b/extractive-summarization/preprocessing-data.ipynb new file mode 100644 index 0000000..6c4de05 --- /dev/null +++ b/extractive-summarization/preprocessing-data.ipynb @@ -0,0 +1,448 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "# !pip3 install malaya\n", + "\n", + "import malaya\n", + "import re\n", + "from malaya.texts._text_functions import split_into_sentences\n", + "from malaya.texts import _regex\n", + "\n", + "tokenizer = malaya.preprocessing._tokenizer\n", + "splitter = split_into_sentences" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "92579" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import glob\n", + "\n", + "stories = glob.glob('cnn/stories/*.story')\n", + "len(stories)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def is_number_regex(s):\n", + " if re.match(\"^\\d+?\\.\\d+?$\", s) is None:\n", + " return s.isdigit()\n", + " return True\n", + "\n", + "def preprocessing(string):\n", + " string = re.sub('[^\\'\"A-Za-z\\-(),.$0-9 ]+', ' ', string.lower())\n", + " tokenized = tokenizer(string)\n", + " tokens = []\n", + " for w in tokenized:\n", + " if is_number_regex(w):\n", + " tokens.append('')\n", + " elif re.match(_regex._money, w):\n", + " tokens.append('')\n", + " elif re.match(_regex._date, w):\n", + " tokens.append('')\n", + " else:\n", + " tokens.append(w)\n", + " return tokens\n", + "\n", + "def split_story(doc):\n", + " index = doc.find('@highlight')\n", + " story, highlights = doc[:index], doc[index:].split('@highlight')\n", + " highlights = [h.strip() for h in highlights if len(h) > 0]\n", + " stories = []\n", + " for s in splitter(story):\n", + " stories.append(preprocessing(s))\n", + " summaries = []\n", + " for s in highlights:\n", + " summaries.append(preprocessing(s))\n", + " return stories, summaries" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "min_src_nsents = 3\n", + "max_src_nsents = 20\n", + "min_src_ntokens_per_sent = 5\n", + "max_src_ntokens_per_sent = 30\n", + "min_tgt_ntokens = 5\n", + "max_tgt_ntokens = 500\n", + "sep_token = '[SEP]'\n", + "cls_token = '[CLS]'" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "with open(stories[0]) as fopen:\n", + " story = fopen.read()\n", + "story, highlights = split_story(story)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "def _get_ngrams(n, text):\n", + " ngram_set = set()\n", + " text_length = len(text)\n", + " max_index_ngram_start = text_length - n\n", + " for i in range(max_index_ngram_start + 1):\n", + " ngram_set.add(tuple(text[i:i + n]))\n", + " return ngram_set\n", + "\n", + "\n", + "def _get_word_ngrams(n, sentences):\n", + " assert len(sentences) > 0\n", + " assert n > 0\n", + "\n", + " words = sum(sentences, [])\n", + " return _get_ngrams(n, words)\n", + "\n", + "def cal_rouge(evaluated_ngrams, reference_ngrams):\n", + " reference_count = len(reference_ngrams)\n", + " evaluated_count = len(evaluated_ngrams)\n", + "\n", + " overlapping_ngrams = evaluated_ngrams.intersection(reference_ngrams)\n", + " overlapping_count = len(overlapping_ngrams)\n", + "\n", + " if evaluated_count == 0:\n", + " precision = 0.0\n", + " else:\n", + " precision = overlapping_count / evaluated_count\n", + "\n", + " if reference_count == 0:\n", + " recall = 0.0\n", + " else:\n", + " recall = overlapping_count / reference_count\n", + "\n", + " f1_score = 2.0 * ((precision * recall) / (precision + recall + 1e-8))\n", + " return {\"f\": f1_score, \"p\": precision, \"r\": recall}\n", + "\n", + "\n", + "def greedy_selection(doc_sent_list, abstract_sent_list, summary_size):\n", + " def _rouge_clean(s):\n", + " return re.sub(r'[^a-zA-Z0-9 ]', '', s)\n", + "\n", + " max_rouge = 0.0\n", + " abstract = sum(abstract_sent_list, [])\n", + " abstract = _rouge_clean(' '.join(abstract)).split()\n", + " sents = [_rouge_clean(' '.join(s)).split() for s in doc_sent_list]\n", + " evaluated_1grams = [_get_word_ngrams(1, [sent]) for sent in sents]\n", + " reference_1grams = _get_word_ngrams(1, [abstract])\n", + " evaluated_2grams = [_get_word_ngrams(2, [sent]) for sent in sents]\n", + " reference_2grams = _get_word_ngrams(2, [abstract])\n", + "\n", + " selected = []\n", + " for s in range(summary_size):\n", + " cur_max_rouge = max_rouge\n", + " cur_id = -1\n", + " for i in range(len(sents)):\n", + " if (i in selected):\n", + " continue\n", + " c = selected + [i]\n", + " candidates_1 = [evaluated_1grams[idx] for idx in c]\n", + " candidates_1 = set.union(*map(set, candidates_1))\n", + " candidates_2 = [evaluated_2grams[idx] for idx in c]\n", + " candidates_2 = set.union(*map(set, candidates_2))\n", + " rouge_1 = cal_rouge(candidates_1, reference_1grams)['f']\n", + " rouge_2 = cal_rouge(candidates_2, reference_2grams)['f']\n", + " rouge_score = rouge_1 + rouge_2\n", + " if rouge_score > cur_max_rouge:\n", + " cur_max_rouge = rouge_score\n", + " cur_id = i\n", + " if (cur_id == -1):\n", + " return selected\n", + " selected.append(cur_id)\n", + " max_rouge = cur_max_rouge\n", + "\n", + " return sorted(selected)\n", + "\n", + "def get_xy(story, highlights):\n", + " idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n", + " \n", + " idxs = [i for i, s in enumerate(story) if (len(s) > min_src_ntokens_per_sent)]\n", + "\n", + " src = [story[i][:max_src_ntokens_per_sent] for i in idxs]\n", + " src = src[:max_src_nsents]\n", + "\n", + " sent_labels = greedy_selection(src, highlights, 3)\n", + "\n", + " _sent_labels = [0] * len(src)\n", + " for l in sent_labels:\n", + " _sent_labels[l] = 1\n", + " _sent_labels\n", + " \n", + " src_txt = [' '.join(sent) for sent in src]\n", + " text = ' {} {} '.format(sep_token, cls_token).join(src_txt)\n", + " text = '[CLS] %s [SEP]'%(text)\n", + " cls_ids = [i for i, t in enumerate(text.split()) if t == cls_token]\n", + " \n", + " return text, cls_ids, _sent_labels" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import json\n", + "\n", + "def build_dataset(words, n_words, atleast=1):\n", + " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", + " counter = collections.Counter(words).most_common(n_words)\n", + " counter = [i for i in counter if i[1] >= atleast]\n", + " count.extend(counter)\n", + " dictionary = dict()\n", + " for word, _ in count:\n", + " dictionary[word] = len(dictionary)\n", + " data = list()\n", + " unk_count = 0\n", + " for word in words:\n", + " index = dictionary.get(word, 0)\n", + " if index == 0:\n", + " unk_count += 1\n", + " data.append(index)\n", + " count[0][1] = unk_count\n", + " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", + " return data, count, dictionary, reversed_dictionary" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# from dask import delayed\n", + "# import dask\n", + "\n", + "# def process(i):\n", + "# with open(stories[i]) as fopen:\n", + "# story = fopen.read()\n", + "# story, highlights = split_story(story)\n", + "# return get_xy(story, highlights)\n", + "\n", + "# train = []\n", + "# for i in range(len(stories)):\n", + "# im = delayed(process)(i)\n", + "# train.append(im)\n", + " \n", + "# train = dask.compute(*train)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "with open(stories[1]) as fopen:\n", + " story = fopen.read()\n", + "story, highlights = split_story(story)\n", + "text, cls_ids, sent_labels = get_xy(story, highlights)" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(20, 20, 560)" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(sent_labels), len(cls_ids), len(text.split())" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 92579/92579 [16:41<00:00, 92.41it/s] \n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "texts, clss, labels = [], [], []\n", + "\n", + "for i in tqdm(range(len(stories))):\n", + " with open(stories[i]) as fopen:\n", + " story = fopen.read()\n", + " story, highlights = split_story(story)\n", + " text, cls_ids, sent_labels = get_xy(story, highlights)\n", + " if len(cls_ids) != len(sent_labels):\n", + " continue\n", + " texts.append(text)\n", + " clss.append(cls_ids)\n", + " labels.append(sent_labels)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "vocab from size: 118356\n", + "Most common words [('the', 1974502), (',', 1740960), ('[CLS]', 1668596), ('[SEP]', 1668596), ('.', 1284463), ('to', 844716)]\n" + ] + } + ], + "source": [ + "concat = ' '.join(texts).split()\n", + "vocabulary_size = len(list(set(concat)))\n", + "_, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size, atleast = 2)\n", + "print('vocab from size: %d'%(len(dictionary)))\n", + "print('Most common words', count[4:10])" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_texts, test_texts, train_clss, test_clss, train_labels, test_labels = \\\n", + "train_test_split(texts, clss, labels, test_size = 0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "with open('dataset.pkl', 'wb') as fopen:\n", + " pickle.dump({'train_texts': train_texts,\n", + " 'test_texts': test_texts,\n", + " 'train_clss': train_clss,\n", + " 'test_clss': test_clss,\n", + " 'train_labels': train_labels,\n", + " 'test_labels': test_labels}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dictionary.pkl', 'wb') as fopen:\n", + " pickle.dump({'dictionary': dictionary, 'rev_dictionary': rev_dictionary}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/1.basic-seq2seq-manual.ipynb b/neural-machine-translation/1.basic-seq2seq-manual.ipynb deleted file mode 100644 index 22a3711..0000000 --- a/neural-machine-translation/1.basic-seq2seq-manual.ipynb +++ /dev/null @@ -1,416 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " sequence_length=self.X_seq_len,\n", - " dtype = tf.float32)\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " sequence_length=self.X_seq_len,\n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :6: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.604396, avg accuracy: 0.881736\n", - "epoch: 2, avg loss: 0.833020, avg accuracy: 0.912527\n", - "epoch: 3, avg loss: 0.771523, avg accuracy: 0.914118\n", - "epoch: 4, avg loss: 0.753420, avg accuracy: 0.913445\n", - "epoch: 5, avg loss: 0.721722, avg accuracy: 0.913764\n", - "epoch: 6, avg loss: 0.699771, avg accuracy: 0.915882\n", - "epoch: 7, avg loss: 0.731362, avg accuracy: 0.915182\n", - "epoch: 8, avg loss: 0.840826, avg accuracy: 0.909773\n", - "epoch: 9, avg loss: 0.744763, avg accuracy: 0.914564\n", - "epoch: 10, avg loss: 0.716952, avg accuracy: 0.915045\n", - "epoch: 11, avg loss: 0.724257, avg accuracy: 0.913836\n", - "epoch: 12, avg loss: 1.140763, avg accuracy: 0.863182\n", - "epoch: 13, avg loss: 0.987183, avg accuracy: 0.909855\n", - "epoch: 14, avg loss: 0.821809, avg accuracy: 0.913555\n", - "epoch: 15, avg loss: 0.750108, avg accuracy: 0.914145\n", - "epoch: 16, avg loss: 0.754669, avg accuracy: 0.915127\n", - "epoch: 17, avg loss: 0.750940, avg accuracy: 0.914536\n", - "epoch: 18, avg loss: 0.734306, avg accuracy: 0.914864\n", - "epoch: 19, avg loss: 0.711908, avg accuracy: 0.916727\n", - "epoch: 20, avg loss: 0.707829, avg accuracy: 0.915255\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: there were a lot of risks involved that they talked about during the informed consent portion .\n", - "REAL ANSWER: có rất nhiều nguy cơ liên quan mà họ nói đến trong phần thông báo sự chấp thuận .\n", - "PREDICTED ANSWER: và tôi thể , tôi , , , và . . , và , , , , " . \n", - "\n", - "row 2\n", - "QUESTION: any powerful technology is inherently dual use , and , you know , you get something like synthetic biology , nanobiotechnology , it really compels you , you have to look at both the amateur groups but also the professional groups , because they have better infrastructure , they have better facilities , and they have access to pathogens .\n", - "REAL ANSWER: bất kì kỹ thuật nào cũng như con dao hai lưỡi , và , bạn biết rằng , khi bạn có những thứ như sinh học tổng hợp , công nghệ sinh học nano , bạn buộc phải nhìn vào không chỉ những nhóm nghiệp dư mà cả những nhóm chuyên nghiệp , vì họ có cơ sở hạ tầng tốt hơn , họ có điều kiện thuận lợi hơn , và họ có thể tiếp cận các tác nhân gây bệnh .\n", - "PREDICTED ANSWER: và tôi tôi , tôi , . , , , , , , , , , , , , . , , , , , , , , , , , . một , , , và , và , , . , , , , , , . , , , , và . \n", - "\n", - "row 3\n", - "QUESTION: and it 's not just water that this works with .\n", - "REAL ANSWER: không chỉ có nước\n", - "PREDICTED ANSWER: và tôi là có một \n", - "\n", - "row 4\n", - "QUESTION: but what you 're seeing here is 160 to 175 degrees , and anything over 150 is superhydrophobic .\n", - "REAL ANSWER: chúng ta đang nhìn thấy đây là 160 - 175 độ , và từ 150 độ trở đi là chống thấm cực tốt rồi .\n", - "PREDICTED ANSWER: và tôi tôi , , , . , . chúng , , , , , , . , và . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/1.basic-seq2seq.ipynb b/neural-machine-translation/1.basic-seq2seq.ipynb new file mode 100644 index 0000000..af1dd27 --- /dev/null +++ b/neural-machine-translation/1.basic-seq2seq.ipynb @@ -0,0 +1,795 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 1\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [1] for i in train_Y]\n", + "test_Y = [i + [1] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :28: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :31: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :39: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 1808, 12696, 8452, 21731, 19939, 30043, 2790, 15632,\n", + " 14991, 23571, 25971, 29248, 21537, 23904, 16404, 23591, 23368,\n", + " 21622, 10809, 28732, 24242, 5333, 7897, 21326, 5480, 11738,\n", + " 3247, 27617, 21457, 25716, 17247, 27192, 17667, 8035, 13181,\n", + " 10914]],\n", + " \n", + " [[ 1, 21336, 21300, 21405, 24213, 15055, 22111, 16214, 13292,\n", + " 6687, 29254, 15263, 5125, 16368, 6456, 7106, 25411, 672,\n", + " 21021, 3301, 20996, 29582, 9421, 20400, 21641, 6185, 27892,\n", + " 18989, 25355, 8685, 29759, 21437, 13441, 31125, 2979, 19404,\n", + " 12722]],\n", + " \n", + " [[ 1, 7926, 5672, 18899, 16910, 17422, 4547, 19039, 29544,\n", + " 23979, 25975, 28500, 15936, 27635, 8380, 27892, 5190, 21965,\n", + " 4205, 17290, 13397, 23260, 14851, 30852, 15764, 20643, 8223,\n", + " 8489, 18448, 6146, 31828, 20410, 9419, 22658, 8652, 21392,\n", + " 181]],\n", + " \n", + " [[ 1, 22546, 10482, 5492, 2655, 7222, 892, 3538, 2658,\n", + " 11978, 28157, 18782, 1906, 18031, 9122, 7804, 3217, 30528,\n", + " 30860, 14464, 8526, 27934, 30045, 15255, 7565, 8752, 2292,\n", + " 5362, 3350, 10977, 20091, 11219, 10431, 19383, 15866, 20297,\n", + " 6170]],\n", + " \n", + " [[ 1, 23687, 26989, 1703, 25570, 11806, 22868, 14454, 21294,\n", + " 2805, 13434, 25496, 36, 25419, 18132, 22722, 26147, 29806,\n", + " 28436, 22607, 16378, 28672, 6158, 244, 8290, 15508, 28359,\n", + " 22409, 10521, 18784, 20865, 17084, 11266, 12177, 30762, 23635,\n", + " 10597]],\n", + " \n", + " [[ 1, 20644, 7521, 13216, 3506, 26905, 28752, 5156, 184,\n", + " 10623, 21038, 11544, 3863, 281, 9139, 1304, 19312, 10478,\n", + " 6300, 11178, 1346, 30472, 26665, 20247, 26942, 16522, 22669,\n", + " 17467, 28752, 12791, 1381, 8063, 17494, 21605, 31581, 2556,\n", + " 31181]],\n", + " \n", + " [[ 1, 30269, 27388, 3168, 17688, 22580, 3315, 28312, 6546,\n", + " 5370, 20560, 21847, 9305, 1620, 1414, 8663, 27933, 11972,\n", + " 25492, 8276, 11705, 3050, 31867, 91, 31432, 7096, 30914,\n", + " 24039, 28127, 2793, 24057, 29349, 1687, 2714, 21329, 8324,\n", + " 22387]],\n", + " \n", + " [[ 1, 6366, 1375, 3043, 26497, 10677, 1857, 19086, 10266,\n", + " 23383, 4350, 5581, 30528, 15468, 6116, 30563, 18376, 23884,\n", + " 29387, 19645, 19099, 9928, 9927, 25478, 2669, 11290, 18126,\n", + " 15327, 17062, 21438, 9476, 13066, 31857, 30285, 7215, 12387,\n", + " 30039]],\n", + " \n", + " [[ 1, 19081, 3477, 8257, 15002, 7448, 23627, 18929, 31372,\n", + " 16188, 17015, 8075, 23225, 26131, 1227, 14557, 9321, 29432,\n", + " 3477, 5109, 572, 291, 30918, 7826, 24605, 26347, 26629,\n", + " 7984, 24024, 3423, 23946, 21702, 25515, 27258, 2879, 23326,\n", + " 16852]],\n", + " \n", + " [[ 1, 21776, 7004, 21758, 26648, 3685, 2535, 336, 23971,\n", + " 5260, 18235, 8149, 29412, 10299, 30340, 16015, 8964, 25094,\n", + " 20866, 6141, 10987, 9748, 23864, 23861, 2710, 10120, 26537,\n", + " 27214, 16598, 26699, 25621, 26998, 20326, 28584, 5452, 23343,\n", + " 22581]]], dtype=int32), 10.37746, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [05:04<00:00, 5.13it/s, accuracy=0.0739, cost=7.46]\n", + "minibatch loop: 100%|██████████| 40/40 [00:04<00:00, 9.47it/s, accuracy=0.0806, cost=7.29]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6.319555e-05" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/10.basic-birnn-seq2seq-contrib-greedy.ipynb b/neural-machine-translation/10.basic-birnn-seq2seq-contrib-greedy.ipynb new file mode 100644 index 0000000..3050779 --- /dev/null +++ b/neural-machine-translation/10.basic-birnn-seq2seq-contrib-greedy.ipynb @@ -0,0 +1,783 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw, state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :39: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 7041, 18531, 30711, 1739, 12146, 2835, 7798, 2890, 2698,\n", + " 4719, 18187, 4328, 13869, 31295, 9404, 11027, 2934, 14448,\n", + " 30491, 27088, 12032, 24876, 19813, 29033, 21136, 22400, 15029,\n", + " 12060, 25025, 25914, 27492, 24334, 11805, 9301, 7573, 19687,\n", + " 6376, 3552, 9032, 23418, 16544, 17757, 9092, 29078, 9458,\n", + " 5691, 19373, 30539, 27605, 19633, 25595, 22955, 23336, 30068,\n", + " 4279, 4003, 13061, 29064, 30123, 13727, 2609, 12456, 7578,\n", + " 6108, 17477, 10421, 12575, 18437, 25522, 25121, 22630, 21740],\n", + " [17956, 3084, 1602, 11595, 18838, 11749, 19924, 10231, 1185,\n", + " 15923, 7877, 29854, 8062, 14342, 23727, 18701, 25000, 4564,\n", + " 1755, 13671, 5859, 30610, 29094, 13241, 17423, 6377, 3206,\n", + " 1816, 29632, 24088, 30386, 16604, 30927, 5336, 20420, 12963,\n", + " 16035, 826, 15211, 19, 11203, 8852, 29168, 15080, 548,\n", + " 9467, 27338, 26813, 23767, 26826, 23329, 5731, 12611, 12769,\n", + " 20602, 29295, 24485, 22562, 31061, 26486, 19587, 6464, 1659,\n", + " 15537, 7086, 11594, 4456, 7365, 28589, 12069, 5379, 23940],\n", + " [24287, 6622, 26697, 5806, 8509, 31323, 25700, 12244, 21479,\n", + " 29272, 8562, 2417, 16977, 542, 13391, 16581, 5801, 2842,\n", + " 4594, 31750, 6788, 16642, 29070, 31356, 7956, 15051, 1527,\n", + " 28689, 19813, 19982, 4048, 18723, 14201, 5397, 10154, 7791,\n", + " 4996, 26423, 10785, 21109, 15222, 30243, 13574, 23318, 25355,\n", + " 4995, 8047, 16354, 16466, 17162, 29037, 10959, 23531, 6551,\n", + " 22921, 28139, 18178, 6177, 25487, 27066, 28978, 6655, 2871,\n", + " 21418, 28256, 14788, 15964, 26146, 24943, 27779, 29896, 31231],\n", + " [ 1213, 2139, 25206, 29848, 2248, 7939, 28622, 127, 23131,\n", + " 29587, 25165, 22939, 15491, 16739, 26216, 3775, 8442, 25903,\n", + " 7081, 764, 7104, 26867, 13887, 9999, 10353, 9093, 4827,\n", + " 24866, 21092, 24444, 26027, 8502, 20754, 573, 11101, 29992,\n", + " 3484, 8112, 769, 3380, 28716, 30484, 15058, 29557, 26274,\n", + " 6190, 27913, 23646, 15042, 7784, 24751, 5212, 4825, 11556,\n", + " 4883, 12557, 26915, 25425, 7602, 15263, 1879, 11413, 22126,\n", + " 27173, 3384, 17076, 22162, 26013, 9855, 26389, 1812, 11296],\n", + " [30213, 18705, 27449, 5694, 7725, 18404, 10356, 16191, 19169,\n", + " 22948, 18699, 14335, 13199, 25534, 22777, 4945, 18039, 30861,\n", + " 27505, 15302, 13610, 23905, 26258, 4255, 27012, 5035, 12157,\n", + " 30397, 13388, 9564, 3450, 29957, 12976, 11828, 2870, 3198,\n", + " 26318, 29743, 8872, 15142, 26242, 50, 2533, 2260, 5472,\n", + " 4782, 15399, 9210, 10597, 13652, 10183, 366, 20493, 11648,\n", + " 25490, 11960, 21872, 22124, 21164, 16082, 25091, 29776, 4667,\n", + " 27886, 29010, 29926, 15470, 16066, 5632, 15848, 16501, 30455],\n", + " [26333, 30670, 16701, 31728, 7357, 27886, 27574, 14845, 23144,\n", + " 23937, 20931, 31297, 30271, 6269, 16326, 8241, 25022, 19709,\n", + " 15763, 4113, 25592, 30940, 25789, 17536, 15242, 31987, 30252,\n", + " 28258, 19332, 21136, 3211, 14415, 22463, 28393, 1782, 28046,\n", + " 20855, 29400, 24223, 5822, 26327, 13556, 13641, 22655, 29854,\n", + " 1146, 18503, 7376, 5490, 3884, 21766, 20064, 23732, 28678,\n", + " 4386, 2537, 19182, 9624, 10661, 4077, 30203, 6496, 22385,\n", + " 6957, 10980, 2969, 30136, 9747, 31730, 14798, 29781, 31501],\n", + " [30689, 30441, 31520, 13662, 4845, 26075, 10535, 10357, 2314,\n", + " 7810, 31562, 31494, 311, 30786, 26453, 10890, 13866, 585,\n", + " 5996, 8922, 11751, 23579, 4963, 5381, 12277, 26603, 29692,\n", + " 20342, 16655, 20177, 5839, 25700, 4736, 30724, 24179, 21867,\n", + " 20166, 28966, 23898, 7652, 13727, 20743, 14062, 14274, 19401,\n", + " 1019, 12308, 6800, 7450, 24664, 31500, 7926, 3446, 24319,\n", + " 25694, 2012, 16689, 11734, 20727, 13891, 5734, 28552, 26659,\n", + " 5161, 5813, 9338, 13546, 30032, 24671, 31951, 28362, 27494],\n", + " [27568, 340, 26866, 1605, 13764, 18107, 18157, 9436, 9747,\n", + " 30195, 16769, 10958, 3723, 19904, 16105, 2189, 6900, 21155,\n", + " 9509, 14309, 9634, 12316, 11871, 25243, 16089, 24901, 15244,\n", + " 26327, 26191, 22777, 27724, 21782, 28035, 18526, 6819, 27298,\n", + " 25503, 1290, 24602, 6245, 14147, 625, 1287, 18207, 5806,\n", + " 7557, 11076, 7202, 22972, 30194, 6143, 14564, 11025, 30769,\n", + " 12232, 11113, 13612, 13336, 9369, 11963, 13693, 25051, 1986,\n", + " 13564, 24319, 18580, 21078, 16227, 19822, 31061, 7211, 7010],\n", + " [22716, 28845, 13280, 17424, 6138, 13170, 31255, 11617, 157,\n", + " 6402, 11747, 12332, 23219, 15092, 23059, 27819, 29829, 24490,\n", + " 14225, 20204, 13408, 30356, 16557, 26487, 18706, 5961, 31715,\n", + " 4781, 19833, 8919, 25509, 2732, 28163, 11171, 21279, 10728,\n", + " 390, 1085, 12106, 20674, 24696, 27451, 15125, 11452, 3958,\n", + " 658, 1008, 14455, 22592, 18288, 27654, 17757, 7200, 21822,\n", + " 12697, 494, 21814, 30800, 4767, 17327, 8675, 18452, 20190,\n", + " 12228, 5213, 16729, 8452, 975, 7528, 7088, 3076, 15083],\n", + " [14037, 12934, 11930, 22488, 27104, 19320, 6705, 15700, 16062,\n", + " 30604, 27614, 21608, 15224, 26566, 19773, 24901, 6707, 11140,\n", + " 12413, 9754, 12152, 22577, 4513, 4350, 13905, 23688, 16950,\n", + " 18915, 10013, 8002, 22765, 5849, 15762, 23161, 24139, 27021,\n", + " 21783, 21978, 29381, 31546, 6197, 2408, 24289, 29512, 6558,\n", + " 5445, 1719, 14683, 3961, 19021, 29250, 10375, 20188, 4875,\n", + " 22831, 4003, 18923, 3571, 16411, 8397, 24353, 24806, 477,\n", + " 19856, 29000, 619, 31080, 14070, 2010, 9911, 20947, 15563]],\n", + " dtype=int32), 10.376687, 0.0]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [08:26<00:00, 3.09it/s, accuracy=0.187, cost=5.31]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 5.75it/s, accuracy=0.199, cost=4.67]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/10.basic-birnn-seq2seq-greedy.ipynb b/neural-machine-translation/10.basic-birnn-seq2seq-greedy.ipynb deleted file mode 100644 index 5b8c9cf..0000000 --- a/neural-machine-translation/10.basic-birnn-seq2seq-greedy.ipynb +++ /dev/null @@ -1,439 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['tôi tiếp tục làm thí nghiệm này 1 thời gian',\n", - " 'và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .']" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "text_to[-2:]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.BasicRNNCell(size, reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw, state_bw), -1)\n", - " self.encoder_state = tuple([bi_state] * num_layers)\n", - " \n", - " self.encoder_state = tuple(self.encoder_state[-1] for _ in range(num_layers))\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)])\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :7: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.625805, avg accuracy: 0.062322\n", - "epoch: 2, avg loss: 5.949493, avg accuracy: 0.094182\n", - "epoch: 3, avg loss: 5.650787, avg accuracy: 0.130502\n", - "epoch: 4, avg loss: 5.315249, avg accuracy: 0.152491\n", - "epoch: 5, avg loss: 4.955022, avg accuracy: 0.186465\n", - "epoch: 6, avg loss: 4.574212, avg accuracy: 0.227074\n", - "epoch: 7, avg loss: 4.171162, avg accuracy: 0.276648\n", - "epoch: 8, avg loss: 3.761979, avg accuracy: 0.335339\n", - "epoch: 9, avg loss: 3.358583, avg accuracy: 0.400926\n", - "epoch: 10, avg loss: 2.987428, avg accuracy: 0.464893\n", - "epoch: 11, avg loss: 2.648347, avg accuracy: 0.539903\n", - "epoch: 12, avg loss: 2.340445, avg accuracy: 0.605599\n", - "epoch: 13, avg loss: 2.074902, avg accuracy: 0.658513\n", - "epoch: 14, avg loss: 1.810939, avg accuracy: 0.716842\n", - "epoch: 15, avg loss: 1.573249, avg accuracy: 0.765040\n", - "epoch: 16, avg loss: 1.346753, avg accuracy: 0.812594\n", - "epoch: 17, avg loss: 1.142557, avg accuracy: 0.856812\n", - "epoch: 18, avg loss: 0.958146, avg accuracy: 0.890111\n", - "epoch: 19, avg loss: 0.788740, avg accuracy: 0.925607\n", - "epoch: 20, avg loss: 0.637479, avg accuracy: 0.957355\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/11.lstm-birnn-seq2seq-contrib-greedy.ipynb b/neural-machine-translation/11.lstm-birnn-seq2seq-contrib-greedy.ipynb new file mode 100644 index 0000000..fe3af22 --- /dev/null +++ b/neural-machine-translation/11.lstm-birnn-seq2seq-contrib-greedy.ipynb @@ -0,0 +1,776 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " encoder_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 4339, 1876, 9014, 11717, 11717, 11717, 15753, 15753, 15753,\n", + " 15753, 4605, 4605, 4605, 4605, 22586, 22586, 22586, 22586,\n", + " 19122, 19122, 2038, 13432, 13432, 13432, 1221, 1221, 12985,\n", + " 12985, 12985, 28363, 6470, 2202, 2202, 11028, 11028, 13253,\n", + " 8829, 24849, 24849, 7118, 7118, 7118, 7118, 15194, 15194,\n", + " 20280, 20280, 15194, 29661, 29661, 15320, 29661, 15320, 12648,\n", + " 12648, 12648, 12648, 12648, 16292, 16292, 16292, 16292, 6454,\n", + " 6454, 13379, 13379, 3862, 3862, 3862, 3862, 19532, 1910],\n", + " [14882, 14882, 14882, 24400, 17865, 26020, 26020, 22443, 12389,\n", + " 12389, 12389, 28876, 14998, 14998, 12973, 12973, 24198, 24198,\n", + " 13693, 19200, 320, 320, 21229, 18734, 18734, 27689, 2162,\n", + " 21474, 27705, 27705, 13538, 13538, 25752, 25752, 25752, 25752,\n", + " 6927, 3657, 3657, 3657, 7357, 7357, 13991, 13991, 13991,\n", + " 13991, 16511, 16511, 16511, 16511, 18389, 9395, 9395, 65,\n", + " 65, 65, 65, 65, 65, 31071, 23053, 23053, 23053,\n", + " 23053, 14868, 21129, 21129, 21129, 21129, 4810, 4810, 4810],\n", + " [19283, 11746, 11746, 10581, 10916, 14548, 152, 152, 152,\n", + " 29980, 29980, 8162, 8162, 18710, 18710, 18710, 18710, 8885,\n", + " 8885, 9488, 9488, 9488, 9488, 9488, 3611, 19408, 14450,\n", + " 14450, 14450, 16338, 9324, 9324, 9324, 15787, 28008, 28008,\n", + " 20413, 4200, 4200, 4200, 25092, 25092, 24267, 24267, 24267,\n", + " 14633, 14633, 12156, 12156, 1748, 4142, 4142, 4142, 7824,\n", + " 7824, 7624, 27562, 9829, 9829, 9829, 25306, 10663, 14430,\n", + " 14430, 14430, 14430, 14430, 1485, 28775, 28775, 28775, 28775],\n", + " [ 9010, 9010, 30614, 30614, 7426, 7426, 5279, 30910, 30910,\n", + " 1141, 1141, 297, 297, 22346, 4652, 4652, 7795, 5813,\n", + " 23624, 5614, 23624, 3705, 10906, 18198, 15156, 30139, 30139,\n", + " 2742, 2742, 23242, 23242, 30524, 4219, 28196, 28196, 3335,\n", + " 3335, 21167, 21167, 21167, 4782, 4782, 24931, 24931, 24931,\n", + " 15266, 15266, 15266, 2976, 29479, 25284, 29479, 29479, 361,\n", + " 21583, 12288, 26061, 26061, 12581, 12976, 26061, 1023, 1023,\n", + " 20414, 22083, 22286, 20682, 20682, 20682, 20682, 20682, 7765],\n", + " [21372, 21372, 26244, 9332, 9332, 9332, 24486, 21308, 21308,\n", + " 21308, 21308, 21308, 29422, 29422, 29422, 25367, 25367, 25367,\n", + " 11872, 29999, 11872, 29999, 17294, 19622, 19622, 19622, 16706,\n", + " 16706, 16706, 8577, 8577, 16803, 17517, 17517, 17517, 17517,\n", + " 882, 882, 882, 882, 10969, 10969, 10969, 10969, 13280,\n", + " 14325, 13280, 14325, 8818, 8818, 28286, 28286, 28286, 28286,\n", + " 15013, 15013, 20119, 20119, 14983, 14983, 14983, 26374, 26374,\n", + " 26374, 26374, 18198, 18198, 18198, 9687, 400, 400, 31251],\n", + " [21642, 11824, 21642, 22898, 22898, 22898, 309, 309, 21418,\n", + " 12175, 12175, 14445, 14445, 29484, 29484, 22806, 8874, 8874,\n", + " 8874, 8874, 31275, 31275, 31275, 21320, 21320, 21320, 20317,\n", + " 20317, 6039, 6039, 6039, 6039, 6039, 4496, 4496, 5026,\n", + " 5026, 5026, 5026, 21488, 6438, 6438, 6438, 13792, 13792,\n", + " 13792, 21101, 21101, 21101, 28488, 28488, 28488, 28488, 28488,\n", + " 28488, 18929, 2047, 2047, 2047, 2047, 10644, 10644, 30708,\n", + " 30708, 30708, 26584, 25651, 15400, 15400, 15400, 25651, 25651],\n", + " [31492, 26949, 16555, 24015, 21365, 30205, 30205, 26065, 26065,\n", + " 20325, 23564, 17687, 12023, 12023, 13855, 13855, 13855, 13855,\n", + " 12023, 4369, 4369, 10998, 10998, 14720, 14720, 16237, 16237,\n", + " 16237, 16237, 16237, 15944, 15944, 8649, 8649, 31473, 7057,\n", + " 7057, 7057, 7057, 12464, 12464, 12464, 12464, 12464, 20706,\n", + " 24891, 24891, 20706, 1328, 1328, 4919, 4919, 8093, 30015,\n", + " 24188, 20618, 20687, 21216, 21216, 26429, 26429, 26429, 26429,\n", + " 29491, 22973, 20968, 15618, 20968, 14601, 4976, 20011, 17397],\n", + " [28173, 28173, 29119, 9357, 30232, 27389, 25008, 25008, 10107,\n", + " 10107, 28532, 19256, 26673, 26673, 21079, 455, 1574, 30760,\n", + " 30760, 27333, 15881, 15881, 11589, 24626, 24626, 8875, 8875,\n", + " 8875, 8009, 9257, 9257, 13632, 13632, 26901, 21366, 26901,\n", + " 9797, 17973, 17973, 17042, 7373, 19834, 19834, 19834, 30268,\n", + " 21169, 18190, 18190, 24022, 24022, 24022, 30089, 30089, 30089,\n", + " 3825, 3825, 3825, 3825, 3190, 3190, 28386, 28386, 26011,\n", + " 26011, 7722, 7722, 7722, 3582, 3582, 30216, 30216, 30216],\n", + " [18650, 14565, 14565, 2391, 11571, 11571, 27061, 27061, 21426,\n", + " 5087, 5087, 5087, 5087, 20237, 24943, 24943, 12513, 12513,\n", + " 12513, 12513, 8223, 1625, 29974, 19518, 19518, 7722, 7722,\n", + " 7722, 7722, 11916, 11916, 11916, 11916, 25395, 25395, 25395,\n", + " 14005, 14005, 11635, 11635, 10501, 11635, 13248, 21330, 5762,\n", + " 5762, 5762, 26014, 21902, 21902, 3041, 31341, 31341, 31341,\n", + " 23214, 23214, 6311, 6311, 13578, 17559, 17559, 23214, 28568,\n", + " 28568, 12763, 28568, 7793, 27030, 17518, 17518, 27520, 22779],\n", + " [19035, 19035, 10493, 10493, 1213, 1213, 21578, 21578, 15312,\n", + " 28399, 20919, 10355, 20919, 10355, 8429, 19064, 31783, 31783,\n", + " 31783, 31783, 31783, 1266, 1266, 29805, 28008, 28008, 28008,\n", + " 9048, 9048, 9048, 9048, 5167, 5167, 25738, 25738, 28421,\n", + " 28421, 28421, 24520, 28488, 28488, 28488, 31355, 5373, 5373,\n", + " 15397, 15397, 15397, 22291, 22291, 638, 638, 638, 29395,\n", + " 29395, 29395, 29395, 12870, 13745, 13745, 13745, 13745, 2405,\n", + " 3287, 13628, 13628, 15595, 15595, 15595, 15595, 15595, 3681]],\n", + " dtype=int32), 10.373779, 0.0]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:18<00:00, 2.30it/s, accuracy=0.199, cost=5.13]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.87it/s, accuracy=0.215, cost=4.58]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/11.lstm-birnn-seq2seq-greedy.ipynb b/neural-machine-translation/11.lstm-birnn-seq2seq-greedy.ipynb deleted file mode 100644 index 37f7742..0000000 --- a/neural-machine-translation/11.lstm-birnn-seq2seq-greedy.ipynb +++ /dev/null @@ -1,424 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['tôi tiếp tục làm thí nghiệm này 1 thời gian',\n", - " 'và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .']" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "text_to[-2:]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", - " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " self.encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " self.encoder_state = tuple(self.encoder_state[-1] for _ in range(num_layers))\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)])\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.629360, avg accuracy: 0.052543\n", - "epoch: 2, avg loss: 6.113452, avg accuracy: 0.077949\n", - "epoch: 3, avg loss: 5.992076, avg accuracy: 0.088334\n", - "epoch: 4, avg loss: 5.891252, avg accuracy: 0.110016\n", - "epoch: 5, avg loss: 5.819877, avg accuracy: 0.114913\n", - "epoch: 6, avg loss: 5.760571, avg accuracy: 0.120810\n", - "epoch: 7, avg loss: 5.691116, avg accuracy: 0.125001\n", - "epoch: 8, avg loss: 5.606406, avg accuracy: 0.132078\n", - "epoch: 9, avg loss: 5.512288, avg accuracy: 0.137671\n", - "epoch: 10, avg loss: 5.429221, avg accuracy: 0.144395\n", - "epoch: 11, avg loss: 5.329564, avg accuracy: 0.154254\n", - "epoch: 12, avg loss: 5.229920, avg accuracy: 0.158055\n", - "epoch: 13, avg loss: 5.134067, avg accuracy: 0.167498\n", - "epoch: 14, avg loss: 5.047961, avg accuracy: 0.171943\n", - "epoch: 15, avg loss: 4.971337, avg accuracy: 0.174937\n", - "epoch: 16, avg loss: 4.887931, avg accuracy: 0.179139\n", - "epoch: 17, avg loss: 4.813043, avg accuracy: 0.182352\n", - "epoch: 18, avg loss: 4.724143, avg accuracy: 0.187680\n", - "epoch: 19, avg loss: 4.620947, avg accuracy: 0.194570\n", - "epoch: 20, avg loss: 4.508872, avg accuracy: 0.202628\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi có thể làm bạn có thể làm bạn có thể làm bạn có thể làm gì ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: tôi có thể làm , tôi sẽ làm việc đó . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm ra , tôi sẽ làm được tôi ? \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi có thể làm , bạn có thể làm , bạn có thể làm bạn có thể làm bạn có thể làm gì ? \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/12.gru-birnn-seq2seq-contrib-greedy.ipynb b/neural-machine-translation/12.gru-birnn-seq2seq-contrib-greedy.ipynb new file mode 100644 index 0000000..9c2ff4a --- /dev/null +++ b/neural-machine-translation/12.gru-birnn-seq2seq-contrib-greedy.ipynb @@ -0,0 +1,807 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw, state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :30: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :39: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 8180, 8180, 13143, 13143, 12937, 6992, 4556, 4556, 9512,\n", + " 24078, 24078, 20187, 20187, 20187, 20187, 20187, 5649, 5649,\n", + " 5649, 17526, 17526, 17526, 17526, 10614, 10614, 10614, 17526,\n", + " 4803, 4803, 18919, 3277, 14446, 14446, 283, 283, 283,\n", + " 283, 29788, 29788, 1788, 9858, 9858, 28365, 28365, 14894,\n", + " 14894, 8917, 8917, 8917, 8917, 8917, 8917, 8917, 8917,\n", + " 24207, 3448, 3448, 2827, 2827, 16308, 16308, 22395, 22395,\n", + " 22395, 22395, 13721, 13721, 13721, 13721, 13721, 13721, 20560],\n", + " [14031, 14031, 30325, 30325, 28000, 28000, 18092, 18092, 18092,\n", + " 11295, 11295, 1162, 1162, 1162, 10536, 10536, 10536, 25376,\n", + " 25376, 27602, 27602, 27602, 9451, 9451, 9451, 25158, 25158,\n", + " 25158, 29397, 29397, 2948, 13433, 13433, 13433, 26233, 21482,\n", + " 15779, 15779, 15779, 6, 5723, 5723, 5723, 5723, 4474,\n", + " 4474, 10896, 10896, 25678, 25678, 25678, 25678, 24216, 24216,\n", + " 24216, 25678, 25678, 26546, 26546, 26546, 26546, 26937, 26937,\n", + " 26937, 26937, 26937, 13523, 13523, 5507, 5507, 5507, 12237],\n", + " [ 3492, 29282, 5618, 5618, 5618, 31371, 15253, 15253, 9594,\n", + " 26231, 23601, 23601, 23601, 23601, 17818, 17818, 790, 790,\n", + " 28328, 28328, 28328, 28328, 28328, 21269, 25599, 25599, 20777,\n", + " 20777, 15509, 7506, 7506, 7506, 7506, 16357, 16357, 9381,\n", + " 18494, 18494, 18494, 311, 311, 311, 23309, 23309, 29333,\n", + " 29333, 12678, 12678, 12678, 1266, 1266, 1266, 9769, 12202,\n", + " 12202, 12202, 12202, 27408, 27408, 27408, 31808, 26305, 26305,\n", + " 21743, 21743, 21743, 21743, 17298, 17298, 23703, 26614, 26614],\n", + " [28527, 11998, 11760, 325, 2733, 13822, 13822, 13822, 13822,\n", + " 13822, 22734, 16097, 645, 645, 8043, 8043, 8043, 13197,\n", + " 13197, 31343, 31343, 31343, 25295, 29300, 10570, 10570, 17455,\n", + " 25557, 25557, 25557, 26040, 26040, 22983, 22983, 22983, 22983,\n", + " 20252, 20252, 635, 635, 3348, 3348, 22729, 21346, 21346,\n", + " 16021, 16021, 16021, 16021, 12749, 12749, 12749, 30041, 4115,\n", + " 13923, 13923, 13923, 13936, 13936, 13936, 2401, 2401, 8540,\n", + " 8540, 8540, 8540, 14287, 14287, 14287, 29668, 4830, 14553],\n", + " [ 851, 12789, 29757, 29757, 29757, 6024, 6024, 10217, 10217,\n", + " 23363, 29120, 29120, 29120, 29120, 16748, 16748, 16748, 16748,\n", + " 17155, 17155, 27051, 27051, 27051, 29052, 28869, 28869, 28869,\n", + " 28869, 22650, 519, 519, 30067, 30067, 10499, 10499, 5401,\n", + " 5401, 5401, 5401, 16051, 16051, 7814, 7814, 7814, 29090,\n", + " 29090, 25969, 25969, 25969, 18542, 1156, 1156, 1156, 26643,\n", + " 16021, 20097, 20097, 6948, 30, 9205, 9205, 27371, 27371,\n", + " 13483, 16565, 16565, 11133, 11133, 11133, 31288, 31288, 19912],\n", + " [13022, 13022, 16371, 16371, 16371, 3368, 3368, 36, 36,\n", + " 36, 14968, 9807, 9807, 17203, 9472, 9472, 9472, 30524,\n", + " 30524, 17294, 17294, 17294, 17294, 17294, 15839, 15839, 29746,\n", + " 29746, 8460, 8460, 8460, 27139, 27139, 27139, 12978, 12978,\n", + " 24962, 29047, 29047, 29047, 1305, 1305, 1365, 1365, 21700,\n", + " 8701, 8701, 8701, 16461, 16461, 23803, 1572, 1572, 1572,\n", + " 19844, 7597, 2980, 13411, 13411, 13411, 13411, 27237, 13411,\n", + " 7861, 7861, 7861, 594, 594, 594, 12504, 27329, 27329],\n", + " [23638, 15744, 15744, 29832, 29832, 29832, 16877, 16877, 12119,\n", + " 12119, 26912, 26912, 26912, 26912, 6204, 12434, 12434, 12434,\n", + " 3266, 3266, 19456, 19456, 30330, 6281, 30711, 30711, 25006,\n", + " 25006, 3627, 3627, 1801, 29015, 29015, 26619, 6086, 29585,\n", + " 29585, 29585, 29585, 29585, 2929, 2929, 2929, 5909, 13884,\n", + " 31205, 31205, 31205, 27991, 27991, 27991, 23283, 23283, 23283,\n", + " 9129, 26896, 26896, 26896, 24066, 24066, 24066, 24066, 11033,\n", + " 1223, 1223, 1223, 16497, 16497, 16497, 16497, 30817, 30817],\n", + " [31090, 31090, 25561, 25561, 11564, 8687, 12853, 1309, 1309,\n", + " 1046, 8824, 1046, 25155, 9532, 9532, 9532, 9532, 9532,\n", + " 30968, 30968, 30968, 3295, 3295, 3295, 6262, 16557, 7667,\n", + " 6262, 8595, 26006, 26006, 26006, 26006, 24995, 24995, 24995,\n", + " 3736, 31373, 31373, 31373, 27791, 27791, 30818, 30818, 27791,\n", + " 27791, 10868, 10868, 10868, 26871, 26871, 643, 9724, 5304,\n", + " 5304, 5304, 5304, 5304, 16036, 39, 18772, 18772, 11257,\n", + " 11257, 17203, 17203, 17203, 3303, 3303, 3303, 3303, 3303],\n", + " [ 5876, 5876, 5876, 5876, 5876, 29670, 3887, 21589, 21589,\n", + " 21589, 8786, 17782, 17782, 17782, 8173, 16003, 16003, 21320,\n", + " 21320, 21320, 21320, 21320, 21320, 17757, 16209, 21990, 21990,\n", + " 21990, 18542, 18542, 18542, 23119, 23119, 13178, 13178, 13178,\n", + " 13178, 25956, 25956, 12976, 12976, 9614, 26583, 26583, 20131,\n", + " 20131, 20131, 14467, 14467, 22113, 28238, 28595, 28595, 28595,\n", + " 28595, 4721, 4721, 4721, 8331, 8331, 8331, 10321, 15239,\n", + " 15239, 3135, 3135, 3611, 3611, 31969, 13551, 11837, 23029],\n", + " [21806, 16704, 27802, 30047, 30047, 26475, 11241, 19047, 19047,\n", + " 19047, 11241, 11241, 11241, 18943, 18943, 18943, 18943, 18943,\n", + " 18943, 18943, 25417, 8242, 23925, 23925, 23925, 23925, 23925,\n", + " 23925, 17979, 17979, 17979, 30056, 30056, 30056, 9887, 9887,\n", + " 6893, 26938, 26938, 26938, 26938, 429, 429, 8681, 8681,\n", + " 24514, 24514, 2940, 2940, 2940, 21755, 21755, 15425, 28381,\n", + " 15425, 29021, 2353, 2353, 2353, 3880, 3880, 3880, 3880,\n", + " 3880, 15365, 5083, 15762, 17282, 17282, 14841, 14841, 14841]],\n", + " dtype=int32), 10.374331, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:24<00:00, 2.28it/s, accuracy=0.257, cost=4.7] \n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.86it/s, accuracy=0.296, cost=4.07]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/12.gru-birnn-seq2seq-greedy.ipynb b/neural-machine-translation/12.gru-birnn-seq2seq-greedy.ipynb deleted file mode 100644 index da588f8..0000000 --- a/neural-machine-translation/12.gru-birnn-seq2seq-greedy.ipynb +++ /dev/null @@ -1,421 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['tôi tiếp tục làm thí nghiệm này 1 thời gian',\n", - " 'và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .']" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "text_to[-2:]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size, reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw, state_bw), -1)\n", - " self.encoder_state = tuple([bi_state] * num_layers)\n", - " \n", - " self.encoder_state = tuple(self.encoder_state[-1] for _ in range(num_layers))\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)])\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.614276, avg accuracy: 0.050476\n", - "epoch: 2, avg loss: 6.205797, avg accuracy: 0.054426\n", - "epoch: 3, avg loss: 6.095009, avg accuracy: 0.079415\n", - "epoch: 4, avg loss: 5.981892, avg accuracy: 0.100438\n", - "epoch: 5, avg loss: 5.901337, avg accuracy: 0.109276\n", - "epoch: 6, avg loss: 5.792093, avg accuracy: 0.113754\n", - "epoch: 7, avg loss: 5.641451, avg accuracy: 0.126755\n", - "epoch: 8, avg loss: 5.470055, avg accuracy: 0.138390\n", - "epoch: 9, avg loss: 5.289564, avg accuracy: 0.149933\n", - "epoch: 10, avg loss: 5.075469, avg accuracy: 0.172347\n", - "epoch: 11, avg loss: 4.851264, avg accuracy: 0.189333\n", - "epoch: 12, avg loss: 4.627014, avg accuracy: 0.205995\n", - "epoch: 13, avg loss: 4.382761, avg accuracy: 0.229566\n", - "epoch: 14, avg loss: 4.137850, avg accuracy: 0.257346\n", - "epoch: 15, avg loss: 3.912164, avg accuracy: 0.286553\n", - "epoch: 16, avg loss: 3.698820, avg accuracy: 0.320739\n", - "epoch: 17, avg loss: 3.450189, avg accuracy: 0.356606\n", - "epoch: 18, avg loss: 3.196571, avg accuracy: 0.406529\n", - "epoch: 19, avg loss: 2.960001, avg accuracy: 0.449249\n", - "epoch: 20, avg loss: 2.760768, avg accuracy: 0.484461\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? , bạn sẽ hỏi họ bạn có đau không ? , bạn sẽ hỏi họ bạn có thể làm đau đau đau đau đau đau đau đau đau đau đau đau \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa 2 kiểu cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/13.basic-seq2seq-luong.ipynb b/neural-machine-translation/13.basic-seq2seq-luong.ipynb index aea7d14..88f077e 100644 --- a/neural-machine-translation/13.basic-seq2seq-luong.ipynb +++ b/neural-machine-translation/13.basic-seq2seq-luong.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" ] }, { @@ -21,93 +16,42 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], + "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { @@ -116,10 +60,8 @@ "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -128,265 +70,717 @@ "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "size_layer = 256\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "WARNING:tensorflow:From :6: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" ] } ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(174, 220)" + "[array([[[ 1, 27596, 15058, 2924, 5882, 6397, 29587, 31461, 8350,\n", + " 22072, 2044, 13758, 16254, 14291, 23526, 6488, 9524, 16930,\n", + " 14951, 18239, 25162, 539, 25930, 27188, 29207, 20051, 15548,\n", + " 16022, 4815, 4771, 19075, 14876, 13972, 12, 31170, 10489,\n", + " 7344]],\n", + " \n", + " [[ 1, 7360, 11925, 20374, 17881, 25103, 3970, 242, 8167,\n", + " 31870, 5012, 538, 6484, 17786, 11558, 19342, 25839, 29601,\n", + " 799, 10085, 1200, 26253, 3995, 3398, 20489, 17403, 29792,\n", + " 10201, 30725, 1811, 12774, 11033, 238, 26535, 15345, 8130,\n", + " 22401]],\n", + " \n", + " [[ 1, 5921, 21908, 4617, 30062, 8172, 27859, 310, 26276,\n", + " 29514, 9994, 18406, 5875, 10137, 22469, 27792, 1652, 19706,\n", + " 25799, 29427, 23297, 31273, 17661, 10066, 26453, 1196, 955,\n", + " 16781, 14645, 1653, 25320, 24628, 2384, 17661, 17152, 21436,\n", + " 21658]],\n", + " \n", + " [[ 1, 21515, 3230, 31639, 30889, 13093, 18355, 6290, 3008,\n", + " 30501, 3085, 30281, 9219, 8727, 25591, 4701, 24831, 10191,\n", + " 14810, 28602, 11756, 19503, 2527, 727, 12500, 7722, 9977,\n", + " 7367, 30485, 15424, 29197, 4344, 18668, 31812, 21254, 7313,\n", + " 5195]],\n", + " \n", + " [[ 1, 14393, 20850, 29444, 22271, 21046, 22520, 30570, 21533,\n", + " 2265, 13426, 22017, 16062, 7375, 8653, 21268, 26283, 31210,\n", + " 2846, 7588, 20501, 10560, 11313, 16779, 21178, 31337, 31213,\n", + " 14054, 1733, 9426, 9878, 30736, 11741, 18051, 1248, 637,\n", + " 20110]],\n", + " \n", + " [[ 1, 29285, 9174, 23052, 23279, 23486, 6816, 2003, 19224,\n", + " 17533, 5484, 2273, 15402, 13204, 2103, 5219, 16116, 1578,\n", + " 823, 26916, 5357, 17096, 4147, 10053, 20938, 28860, 13580,\n", + " 14735, 16614, 834, 14981, 10955, 24084, 17120, 13546, 22535,\n", + " 29305]],\n", + " \n", + " [[ 1, 7172, 17728, 994, 18811, 28142, 8098, 21858, 21170,\n", + " 17543, 31481, 18600, 11686, 26528, 23233, 2944, 23379, 27350,\n", + " 5096, 13907, 14786, 647, 19748, 29882, 11277, 6127, 12400,\n", + " 2273, 28941, 26191, 19230, 20852, 12040, 25189, 22582, 28042,\n", + " 16348]],\n", + " \n", + " [[ 1, 17728, 29428, 31794, 29801, 11875, 16605, 16957, 10814,\n", + " 30889, 2762, 10944, 755, 25085, 5285, 6217, 31843, 25201,\n", + " 31726, 10971, 28107, 22255, 15704, 1373, 19618, 24813, 28372,\n", + " 8780, 12478, 5278, 729, 29055, 19571, 31014, 14898, 14967,\n", + " 7742]],\n", + " \n", + " [[ 1, 15577, 17527, 12210, 2537, 31746, 17234, 3044, 2724,\n", + " 13463, 3368, 19271, 23381, 1799, 31160, 15737, 22752, 362,\n", + " 12663, 29280, 5829, 10902, 21653, 1212, 20517, 24306, 2386,\n", + " 28447, 29583, 23112, 1300, 2877, 1337, 11027, 19129, 19723,\n", + " 6497]],\n", + " \n", + " [[ 1, 19222, 15829, 15808, 11394, 13204, 9152, 12789, 1349,\n", + " 16745, 15300, 5622, 10390, 7480, 7904, 12515, 28611, 8853,\n", + " 15266, 7155, 12664, 17535, 16674, 18758, 1149, 12656, 31539,\n", + " 23785, 31813, 14694, 18188, 1390, 8870, 18717, 191, 15100,\n", + " 2846]]], dtype=int32), 10.375248, 0.0]" ] }, - "execution_count": 13, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 12, "metadata": {}, "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [06:47<00:00, 3.83it/s, accuracy=0.0875, cost=7.18]\n", + "minibatch loop: 100%|██████████| 40/40 [00:05<00:00, 7.33it/s, accuracy=0.0968, cost=7.07]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "8.97118e-05" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] } ], "metadata": { diff --git a/neural-machine-translation/14.lstm-seq2seq-luong.ipynb b/neural-machine-translation/14.lstm-seq2seq-luong.ipynb index 85f6479..4c73de2 100644 --- a/neural-machine-translation/14.lstm-seq2seq-luong.ipynb +++ b/neural-machine-translation/14.lstm-seq2seq-luong.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" ] }, { @@ -21,93 +16,42 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], + "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { @@ -116,10 +60,8 @@ "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -128,208 +70,693 @@ "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "size_layer = 128\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(174, 220)" + "[array([[[ 1, 11384, 156, 25249, 25249, 20948, 20948, 20948, 20948,\n", + " 20948, 19153, 9072, 20427, 20427, 20427, 17640, 17640, 17745,\n", + " 17640, 17745, 8047, 8047, 8047, 298, 30431, 21464, 21464,\n", + " 21464, 25557, 25557, 10389, 10389, 7755, 7755, 13797, 8054,\n", + " 8054]],\n", + " \n", + " [[ 1, 26786, 26786, 3769, 3769, 3769, 10479, 30326, 10479,\n", + " 10479, 28758, 28758, 28758, 5723, 5723, 28758, 26049, 23921,\n", + " 23921, 23921, 6841, 6841, 13303, 30190, 30190, 30190, 30190,\n", + " 26982, 26982, 26982, 16855, 16855, 16855, 7604, 7604, 28847,\n", + " 5231]],\n", + " \n", + " [[ 1, 12601, 13766, 4975, 4975, 3488, 3488, 3488, 3488,\n", + " 3488, 13643, 12133, 21723, 26334, 7654, 7654, 7654, 3799,\n", + " 3799, 13357, 13357, 13357, 13357, 31248, 7655, 7655, 27641,\n", + " 25167, 25167, 25167, 25167, 31248, 1984, 23881, 24459, 24459,\n", + " 24459]],\n", + " \n", + " [[ 1, 6828, 6828, 18659, 29627, 24190, 21607, 21607, 1068,\n", + " 1068, 20963, 20963, 20963, 20963, 487, 487, 8442, 7831,\n", + " 11165, 11165, 11165, 11165, 7831, 24561, 30898, 30898, 1663,\n", + " 1663, 1663, 4256, 4256, 4256, 4256, 12423, 12423, 24636,\n", + " 12241]],\n", + " \n", + " [[ 1, 30783, 12042, 3742, 3742, 3742, 12291, 12291, 6732,\n", + " 6732, 30527, 14239, 14239, 4831, 4831, 27717, 8854, 8854,\n", + " 21476, 19415, 24592, 28889, 24592, 24592, 10608, 10608, 10127,\n", + " 10127, 14360, 14360, 14360, 29006, 8802, 29147, 18478, 18478,\n", + " 18478]],\n", + " \n", + " [[ 1, 18142, 25124, 25124, 25124, 25124, 19813, 19813, 19813,\n", + " 19813, 19813, 19813, 19813, 19813, 28047, 28047, 21885, 30394,\n", + " 31337, 143, 143, 30385, 25270, 25270, 25270, 8233, 3962,\n", + " 3962, 8233, 3407, 385, 18822, 18822, 18822, 18822, 6523,\n", + " 6523]],\n", + " \n", + " [[ 1, 8900, 8900, 4610, 8016, 4610, 20415, 20415, 20048,\n", + " 15217, 15217, 15217, 20733, 13230, 13230, 13230, 22991, 30035,\n", + " 30035, 4548, 4548, 4548, 4548, 1923, 1923, 1923, 25121,\n", + " 29120, 5988, 5988, 5988, 5988, 17829, 17829, 11153, 11153,\n", + " 11153]],\n", + " \n", + " [[ 1, 14529, 1948, 15936, 13552, 15914, 15914, 4111, 4111,\n", + " 22941, 22941, 22941, 31804, 31804, 2874, 2874, 2874, 20963,\n", + " 20963, 6105, 29588, 29588, 16315, 10377, 11687, 15873, 15873,\n", + " 17850, 18353, 18353, 20008, 20008, 20008, 13984, 13231, 24714,\n", + " 13231]],\n", + " \n", + " [[ 1, 24519, 24519, 26295, 20667, 5049, 8762, 8762, 569,\n", + " 7719, 7719, 7719, 14457, 27467, 13856, 19801, 19801, 31182,\n", + " 31182, 31182, 31182, 4100, 4100, 4100, 18016, 18016, 18016,\n", + " 9361, 20044, 20044, 2749, 2749, 2749, 20354, 20354, 21445,\n", + " 3192]],\n", + " \n", + " [[ 1, 25608, 25608, 25608, 25608, 14497, 14497, 25608, 25608,\n", + " 12590, 12590, 12590, 12590, 12590, 30877, 30877, 30877, 30877,\n", + " 23617, 23617, 23617, 409, 20579, 20579, 7880, 7880, 3911,\n", + " 12614, 28596, 25381, 25381, 25381, 25381, 22478, 22478, 22478,\n", + " 22478]]], dtype=int32), 10.373493, 0.0]" ] }, - "execution_count": 13, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", "\n", - "maxlen_question, maxlen_answer" + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:02<00:00, 2.88it/s, accuracy=0.127, cost=6.46]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 6.38it/s, accuracy=0.14, cost=6.29] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" ] }, { @@ -338,38 +765,19 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: i don 't know if that 's correct english .\n", - "REAL ANSWER: tôi không biết câu đó có đúng ngữ pháp tiếng anh không .\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi . . . . \n", - "\n", - "row 2\n", - "QUESTION: people have wanted to look inside the human mind , the human brain , for thousands of years .\n", - "REAL ANSWER: con người muốn nhìn được vào bên trong ý nghĩ , não người , qua hàng ngàn năm nay ,\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi tôi , , , , , , , , , , , , , . \n", - "\n", - "row 3\n", - "QUESTION: a freshly waxed car , the water molecules slump to about 90 degrees .\n", - "REAL ANSWER: một xe vừa bôi sáp , những phân tử nước sụt xuống gần ̣ 90 độ .\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , . \n", - "\n", - "row 4\n", - "QUESTION: made in 1939 , the film is older than most of our members ' grandparents .\n", - "REAL ANSWER: được làm vào năm 1939 , bộ phim có tuổi già hơn tuổi của hầu hết ông bà của các thành viên\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , , , , , . . . \n", - "\n" - ] + "data": { + "text/plain": [ + "0.053475615" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" ] }, { diff --git a/neural-machine-translation/15.gru-seq2seq-luong.ipynb b/neural-machine-translation/15.gru-seq2seq-luong.ipynb index 70a33e2..df5dab4 100644 --- a/neural-machine-translation/15.gru-seq2seq-luong.ipynb +++ b/neural-machine-translation/15.gru-seq2seq-luong.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" ] }, { @@ -21,93 +16,42 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], + "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { @@ -116,10 +60,8 @@ "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -128,207 +70,696 @@ "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "size_layer = 256\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(174, 220)" + "[array([[[ 1, 31355, 31355, 8391, 8391, 8391, 20320, 20320, 20320,\n", + " 20320, 24333, 24333, 24333, 20118, 3659, 3461, 22930, 27599,\n", + " 27599, 15847, 14110, 2017, 2017, 3775, 3775, 30113, 7346,\n", + " 7346, 7346, 9751, 9751, 14563, 14563, 14563, 14563, 12853,\n", + " 23246]],\n", + " \n", + " [[ 1, 7163, 30412, 30412, 30412, 30412, 18203, 18203, 1554,\n", + " 1554, 1554, 31635, 31635, 31635, 2533, 2533, 2533, 2533,\n", + " 26266, 9881, 9881, 9881, 9881, 9881, 14189, 14189, 14189,\n", + " 14189, 4451, 4451, 23314, 23314, 23314, 23314, 23314, 23314,\n", + " 23314]],\n", + " \n", + " [[ 1, 13562, 13562, 13562, 20672, 23171, 23171, 382, 382,\n", + " 9725, 9725, 13189, 13189, 382, 20454, 382, 20454, 20210,\n", + " 20210, 21113, 21113, 21113, 17886, 12136, 12136, 24183, 24183,\n", + " 30130, 30130, 24071, 24071, 11760, 11760, 10310, 10310, 10431,\n", + " 10431]],\n", + " \n", + " [[ 1, 12254, 4818, 4818, 25832, 25832, 25832, 22218, 22218,\n", + " 22218, 23526, 22218, 2719, 30227, 30227, 5097, 5097, 19840,\n", + " 3066, 3066, 10046, 10046, 10046, 21592, 21592, 25990, 25990,\n", + " 19840, 19840, 19840, 19840, 13620, 13620, 22695, 22695, 22695,\n", + " 22695]],\n", + " \n", + " [[ 1, 20135, 20135, 22087, 22087, 31451, 31451, 21209, 21209,\n", + " 21209, 1943, 1943, 28068, 28068, 12894, 7205, 7205, 7205,\n", + " 13823, 13823, 13823, 13823, 4392, 4392, 4392, 4392, 27421,\n", + " 26466, 26466, 28057, 28057, 28057, 14375, 14375, 14375, 20963,\n", + " 27498]],\n", + " \n", + " [[ 1, 27138, 27138, 27138, 25010, 25010, 25010, 20380, 20380,\n", + " 20380, 19811, 19811, 19811, 19811, 19811, 15875, 19811, 6514,\n", + " 6514, 24676, 24676, 24676, 4520, 4520, 4520, 28453, 28453,\n", + " 28453, 23465, 29413, 29413, 29413, 29413, 29413, 20898, 30023,\n", + " 21144]],\n", + " \n", + " [[ 1, 10318, 17478, 18330, 8732, 8732, 16136, 30655, 8732,\n", + " 22365, 14736, 14736, 14736, 14736, 1542, 1198, 1198, 1198,\n", + " 8456, 24803, 24803, 24803, 21390, 7640, 4841, 4841, 10505,\n", + " 31051, 31051, 7638, 7638, 7029, 7029, 8346, 8346, 8346,\n", + " 14173]],\n", + " \n", + " [[ 1, 21494, 15133, 26372, 26372, 26372, 15071, 3980, 62,\n", + " 62, 62, 62, 3019, 3019, 3019, 3019, 789, 789,\n", + " 789, 15274, 15274, 23171, 23171, 23171, 12225, 12225, 12225,\n", + " 4831, 4831, 4831, 5821, 10572, 10572, 5821, 15126, 19280,\n", + " 24388]],\n", + " \n", + " [[ 1, 30042, 30042, 10888, 4595, 19505, 11539, 11539, 11539,\n", + " 11539, 17978, 11539, 17978, 11539, 26835, 1646, 26835, 16042,\n", + " 16042, 25072, 12932, 12932, 15892, 4584, 31252, 24348, 1012,\n", + " 1012, 1012, 19455, 19455, 19455, 5586, 5586, 5586, 4118,\n", + " 4118]],\n", + " \n", + " [[ 1, 12846, 5753, 5753, 9741, 9741, 1927, 1927, 4514,\n", + " 22473, 22473, 22473, 22473, 19232, 16900, 16900, 13423, 12375,\n", + " 12375, 18120, 18120, 18120, 17560, 17560, 17560, 17560, 27177,\n", + " 27177, 25701, 14261, 14261, 14261, 14261, 25729, 25729, 25729,\n", + " 21432]]], dtype=int32), 10.372889, 0.0]" ] }, - "execution_count": 13, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", "\n", - "maxlen_question, maxlen_answer" + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:14<00:00, 2.82it/s, accuracy=0.131, cost=6.32]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 6.28it/s, accuracy=0.129, cost=6.01]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" ] }, { @@ -337,39 +768,27 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: it 's a lot of water-based materials like concrete , water-based paint , mud , and also some refined oils as well .\n", - "REAL ANSWER: còn rất nhiều chất có nước như bê tông , sơn có chứa nước , bùn , và một số loại dầu tinh chế nữa .\n", - "PREDICTED ANSWER: chúng là là một một . . một một , , , , , , , , , , , , , , và . . . \n", - "\n", - "row 2\n", - "QUESTION: now imagine that you will soon be able to look inside your brain and select brain areas to do that same thing .\n", - "REAL ANSWER: bây giờ hãy tưởng tượng bạn sớm được nhìn vào bên trong não mình và được chọn vùng trên não để làm cùng một việc đó .\n", - "PREDICTED ANSWER: và tôi tôi , , bạn bạn , , , tôi tôi tôi tôi tôi tôi tôi , , , , , . . . . \n", - "\n", - "row 3\n", - "QUESTION: there have been three ways to try to impact the brain : the therapist 's couch , pills and the knife .\n", - "REAL ANSWER: có ba cách để làm ảnh hưởng đến não : giường của nhà trị liệu học , thuốc viên và con dao .\n", - "PREDICTED ANSWER: và tôi có , , , , của , của của của của của của của của của của . . . . . \n", - "\n", - "row 4\n", - "QUESTION: we run enormous models on supercomputers ; this is what i happen to do .\n", - "REAL ANSWER: chúng tôi chạy những mô hình khổng lồ trên siêu máy tính ; đây là công việc của tôi .\n", - "PREDICTED ANSWER: chúng tôi tôi chúng chúng tôi của của của của của của và và và tôi . . . \n", - "\n" - ] + "data": { + "text/plain": [ + "0.01888038" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/neural-machine-translation/16.basic-seq2seq-bahdanau.ipynb b/neural-machine-translation/16.basic-seq2seq-bahdanau.ipynb index 76bbec9..009e4ca 100644 --- a/neural-machine-translation/16.basic-seq2seq-bahdanau.ipynb +++ b/neural-machine-translation/16.basic-seq2seq-bahdanau.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" ] }, { @@ -21,93 +16,42 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], + "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { @@ -116,10 +60,8 @@ "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -128,217 +70,693 @@ "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "size_layer = 256\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "WARNING:tensorflow:From :6: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" + "Use `tf.cast` instead.\n" ] } ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(174, 220)" + "[array([[[ 1, 19508, 9278, 18378, 13421, 30227, 9547, 8103, 1436,\n", + " 11862, 9379, 30948, 23919, 473, 8926, 31710, 21259, 10352,\n", + " 31648, 27234, 28328, 24074, 15440, 5764, 16271, 23236, 27236,\n", + " 2995, 19841, 13161, 30411, 763, 19158, 958, 24045, 9018,\n", + " 22377]],\n", + " \n", + " [[ 1, 12333, 1338, 23573, 5338, 16627, 12571, 1592, 25791,\n", + " 21522, 1206, 14757, 2262, 28976, 10605, 9821, 2487, 26057,\n", + " 7672, 30862, 17629, 14025, 3911, 12018, 18918, 365, 24643,\n", + " 29542, 19069, 20516, 26645, 8063, 16352, 11390, 2916, 469,\n", + " 28543]],\n", + " \n", + " [[ 1, 20497, 10163, 30573, 24962, 1681, 7141, 31867, 25130,\n", + " 15475, 8273, 5719, 912, 30318, 18071, 20755, 17251, 17557,\n", + " 18977, 24487, 17478, 17010, 21247, 26161, 25048, 13653, 1981,\n", + " 7221, 20509, 2517, 22225, 30175, 8321, 13639, 29411, 23374,\n", + " 18908]],\n", + " \n", + " [[ 1, 20467, 11620, 5707, 85, 21637, 688, 7295, 22417,\n", + " 14160, 9683, 3109, 9325, 4464, 9808, 4361, 26377, 7592,\n", + " 1945, 18209, 17419, 4291, 21007, 2596, 11509, 12755, 29303,\n", + " 5200, 1023, 16453, 8536, 17126, 17372, 6934, 24325, 5017,\n", + " 22866]],\n", + " \n", + " [[ 1, 11850, 16139, 18957, 15054, 25209, 23578, 13107, 31689,\n", + " 5375, 26292, 23499, 5660, 16223, 27307, 355, 24318, 24740,\n", + " 5719, 15417, 1002, 25899, 20801, 1790, 9768, 9260, 4893,\n", + " 17687, 4682, 24347, 19688, 9024, 17592, 29935, 12535, 6688,\n", + " 2584]],\n", + " \n", + " [[ 1, 24629, 8769, 13713, 13110, 1502, 25302, 6304, 29498,\n", + " 13531, 4383, 30836, 22799, 21753, 27651, 4978, 27832, 15796,\n", + " 30607, 4595, 27355, 14060, 996, 28285, 18961, 18827, 4356,\n", + " 2321, 28592, 17991, 18805, 31952, 29478, 14067, 28438, 20360,\n", + " 29087]],\n", + " \n", + " [[ 1, 18632, 18968, 28895, 9299, 31331, 26748, 2449, 10751,\n", + " 20199, 27608, 4114, 4817, 12796, 16589, 15470, 9478, 1357,\n", + " 26938, 5087, 3503, 29490, 3367, 26306, 4189, 3665, 16814,\n", + " 18023, 9028, 21122, 30226, 20364, 29405, 28264, 24625, 3761,\n", + " 19144]],\n", + " \n", + " [[ 1, 13723, 23138, 21403, 28446, 16334, 20545, 12848, 31983,\n", + " 29888, 21426, 25040, 27843, 28867, 26211, 19652, 22463, 31315,\n", + " 4978, 22348, 10681, 31267, 18579, 29410, 30179, 2336, 18071,\n", + " 26222, 10380, 28659, 13945, 9503, 14898, 6435, 1099, 9663,\n", + " 14145]],\n", + " \n", + " [[ 1, 9843, 16089, 15138, 24019, 2157, 17632, 18295, 9263,\n", + " 1692, 1399, 17040, 29845, 31704, 1319, 12114, 9210, 29518,\n", + " 13800, 29021, 9399, 4239, 7238, 10353, 15835, 5493, 25934,\n", + " 15468, 15998, 27088, 15636, 30488, 20945, 8483, 31810, 27668,\n", + " 5178]],\n", + " \n", + " [[ 1, 20045, 2958, 28190, 9356, 2041, 21042, 19808, 1221,\n", + " 12819, 13552, 12733, 20286, 9273, 11052, 13770, 15598, 28291,\n", + " 30141, 14566, 12663, 20539, 21499, 30993, 3737, 13857, 15315,\n", + " 4374, 16971, 30049, 10432, 4260, 26441, 18698, 20369, 27274,\n", + " 28442]]], dtype=int32), 10.371715, 0.0]" ] }, - "execution_count": 13, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", "\n", - "maxlen_question, maxlen_answer" + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [07:22<00:00, 3.53it/s, accuracy=0.107, cost=6.92] \n", + "minibatch loop: 100%|██████████| 40/40 [00:05<00:00, 7.06it/s, accuracy=0.108, cost=6.71]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" ] }, { @@ -347,39 +765,27 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: as one 12-year-old said after watching " wizard of oz , " " every person should watch this , because unless you do you may not know that you too have a heart . "\n", - "REAL ANSWER: như một đứa trẻ 12 tuổi nói sau khi xem " phù thuỷ xứ oz " " mọi người nên xem phim này , bới vì nếu không xem mọi người sẽ có thể không biết mình cũng có trái tim "\n", - "PREDICTED ANSWER: và tôi là , , , , , tôi . . , , và , , , , , , , , , , bạn , và , , , và \n", - "\n", - "row 2\n", - "QUESTION: biohackers work alone .\n", - "REAL ANSWER: những hacker sinh học làm việc đơn lẻ .\n", - "PREDICTED ANSWER: và tôi là , . \n", - "\n", - "row 3\n", - "QUESTION: and what narrative , what history , what identity , what moral code are we imparting to our young ?\n", - "REAL ANSWER: và chuyện tường thuật nào , lịch sử nào , bản sắc nào , qui tắc đạo đức nào mà chúng ta đang truyền đạt lại cho thế hệ trẻ của chúng ta ?\n", - "PREDICTED ANSWER: và tôi tôi . , , một , , , , , . và và và , \n", - "\n", - "row 4\n", - "QUESTION: it 's spaces like these that spawned personal computing .\n", - "REAL ANSWER: chính những nơi như thế này đã sản sinh ra máy tính cá nhân .\n", - "PREDICTED ANSWER: và tôi là tôi tôi , , , , tôi . . \n", - "\n" - ] + "data": { + "text/plain": [ + "0.00020161743" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/neural-machine-translation/17.lstm-seq2seq-bahdanau.ipynb b/neural-machine-translation/17.lstm-seq2seq-bahdanau.ipynb index c0c85d0..de54c8c 100644 --- a/neural-machine-translation/17.lstm-seq2seq-bahdanau.ipynb +++ b/neural-machine-translation/17.lstm-seq2seq-bahdanau.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" ] }, { @@ -21,24 +16,9 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { @@ -47,63 +27,31 @@ "metadata": {}, "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 657\n", - "Most common words [('you', 132), ('is', 78), ('i', 68), ('what', 51), ('it', 50), ('that', 49)]\n", - "Sample data [7, 28, 129, 35, 61, 42, 12, 22, 82, 225] ['what', 'good', 'stuff', 'she', 'okay', 'they', 'do', 'to', 'hey', 'sweet']\n", - "filtered vocab size: 661\n", - "% of vocab used: 100.61%\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 660\n", - "Most common words [('i', 97), ('you', 91), ('is', 62), ('it', 58), ('not', 47), ('what', 39)]\n", - "Sample data [12, 216, 5, 4, 94, 25, 59, 10, 8, 79] ['the', 'real', 'you', 'i', 'hope', 'so', 'they', 'do', 'not', 'hi']\n", - "filtered vocab size: 664\n", - "% of vocab used: 100.61%\n" - ] - } - ], + "outputs": [], "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { @@ -112,10 +60,8 @@ "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -124,208 +70,693 @@ "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 8, "metadata": {}, "outputs": [], "source": [ - "size_layer = 128\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 9, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "(10, 12)" + "[array([[[ 1, 27010, 27010, 26352, 26352, 26352, 15843, 15843, 18690,\n", + " 16846, 12084, 11243, 10399, 10399, 10399, 10399, 17067, 31567,\n", + " 29143, 25594, 25594, 25594, 25594, 25594, 209, 209, 209,\n", + " 14074, 14074, 14074, 27422, 22091, 22091, 13153, 10120, 10120,\n", + " 10120]],\n", + " \n", + " [[ 1, 4090, 9499, 9499, 4513, 8146, 8146, 8146, 14797,\n", + " 14797, 16774, 16774, 16776, 16776, 16776, 381, 18044, 6453,\n", + " 18432, 13834, 29884, 29794, 29794, 6422, 6422, 6422, 6422,\n", + " 15892, 15892, 15892, 4184, 4184, 21205, 28975, 28975, 28562,\n", + " 23045]],\n", + " \n", + " [[ 1, 17575, 1279, 6810, 6810, 25159, 21790, 21790, 18539,\n", + " 18539, 18539, 3599, 3599, 21967, 21967, 21967, 16596, 16596,\n", + " 16596, 12864, 12864, 6171, 6171, 5049, 6171, 5240, 9922,\n", + " 5240, 10651, 10651, 10651, 10651, 20684, 20684, 10651, 16677,\n", + " 16677]],\n", + " \n", + " [[ 1, 18458, 11618, 3343, 1509, 28404, 28404, 21862, 17578,\n", + " 10977, 10977, 16441, 21229, 21229, 21229, 14411, 21671, 21671,\n", + " 21671, 11295, 31557, 31557, 31557, 31557, 21248, 21248, 21248,\n", + " 21248, 2403, 2403, 2403, 2060, 2060, 22179, 22179, 16479,\n", + " 18313]],\n", + " \n", + " [[ 1, 5810, 2748, 27155, 5146, 27155, 7143, 7143, 20850,\n", + " 26002, 3322, 3322, 9349, 9349, 28592, 28592, 7483, 7483,\n", + " 7483, 19076, 6089, 6089, 6089, 6089, 911, 21204, 25057,\n", + " 25057, 25057, 25057, 25057, 25057, 25057, 25057, 14103, 14103,\n", + " 14103]],\n", + " \n", + " [[ 1, 27084, 12576, 16215, 16215, 8161, 2146, 8730, 8730,\n", + " 11113, 10088, 10088, 10088, 10088, 10088, 655, 655, 655,\n", + " 655, 24171, 24171, 7517, 9209, 10502, 10502, 22122, 6575,\n", + " 6890, 24896, 24896, 24896, 14128, 24896, 24181, 24181, 24181,\n", + " 29239]],\n", + " \n", + " [[ 1, 11917, 11309, 29918, 27692, 27692, 4491, 21722, 11930,\n", + " 11930, 22198, 22198, 31649, 30513, 30513, 30513, 31109, 31109,\n", + " 29073, 20340, 20340, 29248, 20522, 20522, 20522, 11281, 11281,\n", + " 11281, 23090, 11281, 23090, 11281, 31946, 31946, 10449, 10449,\n", + " 22265]],\n", + " \n", + " [[ 1, 15119, 17204, 16128, 16128, 17805, 26834, 26834, 26834,\n", + " 23776, 30759, 30759, 7642, 7642, 505, 505, 505, 30118,\n", + " 26259, 26259, 24811, 24811, 24811, 24811, 18579, 18579, 18579,\n", + " 28695, 22035, 22035, 22035, 22035, 22035, 22035, 6106, 16752,\n", + " 16752]],\n", + " \n", + " [[ 1, 23424, 30616, 17990, 17990, 25905, 5303, 1707, 25357,\n", + " 25357, 25694, 28151, 11939, 11939, 11939, 11206, 11206, 11206,\n", + " 25392, 11283, 21879, 21879, 21879, 22502, 6230, 11076, 11076,\n", + " 12883, 7152, 7152, 7152, 7152, 27671, 20468, 20468, 20468,\n", + " 20468]],\n", + " \n", + " [[ 1, 13302, 13302, 20484, 20484, 28214, 16964, 16964, 16964,\n", + " 13056, 13056, 13056, 4479, 4479, 23593, 23593, 23593, 23593,\n", + " 23593, 9637, 2292, 2292, 9637, 9637, 14551, 14551, 14551,\n", + " 8659, 8659, 1281, 1281, 1739, 1739, 4399, 4399, 13485,\n", + " 13485]]], dtype=int32), 10.373304, 0.0]" ] }, - "execution_count": 13, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", "\n", - "maxlen_question, maxlen_answer" + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:30<00:00, 2.74it/s, accuracy=0.131, cost=6.48]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 6.25it/s, accuracy=0.118, cost=6.32]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" ] }, { @@ -334,38 +765,19 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: yes my love\n", - "REAL ANSWER: these gentlemen are from salzburg\n", - "PREDICTED ANSWER: i is not it \n", - "\n", - "row 2\n", - "QUESTION: so which dakota you from\n", - "REAL ANSWER: north actually how would you\n", - "PREDICTED ANSWER: i is not it \n", - "\n", - "row 3\n", - "QUESTION: i used an accelerant\n", - "REAL ANSWER: yeah what kind\n", - "PREDICTED ANSWER: i is \n", - "\n", - "row 4\n", - "QUESTION: what anger\n", - "REAL ANSWER: about the ballet\n", - "PREDICTED ANSWER: i is \n", - "\n" - ] + "data": { + "text/plain": [ + "0.048261568" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" ] }, { diff --git a/neural-machine-translation/18.gru-seq2seq-bahdanau.ipynb b/neural-machine-translation/18.gru-seq2seq-bahdanau.ipynb index 3662c72..07d2c6b 100644 --- a/neural-machine-translation/18.gru-seq2seq-bahdanau.ipynb +++ b/neural-machine-translation/18.gru-seq2seq-bahdanau.ipynb @@ -6,13 +6,8 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" ] }, { @@ -21,115 +16,52 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], + "outputs": [], "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], + "outputs": [], "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, "outputs": [], "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" ] }, { @@ -138,52 +70,92 @@ "metadata": {}, "outputs": [], "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", " \n", " def cells(reuse=False):\n", " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", " \n", " self.X = tf.placeholder(tf.int32, [None, None])\n", " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", " batch_size = tf.shape(self.X)[0]\n", " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " last_output, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", " \n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_embedded)\n", - " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " last_state = tuple(last_state[0][-1] for _ in range(num_layers))\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", " targets = self.Y,\n", " weights = masks)\n", " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", " y_t = tf.cast(y_t, tf.int32)\n", " self.prediction = tf.boolean_mask(y_t, masks)\n", " mask_label = tf.boolean_mask(self.Y, masks)\n", " correct_pred = tf.equal(self.prediction, mask_label)\n", " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" ] }, { @@ -192,11 +164,11 @@ "metadata": {}, "outputs": [], "source": [ - "size_layer = 128\n", + "size_layer = 512\n", "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", "epoch = 20" ] }, @@ -204,12 +176,59 @@ "cell_type": "code", "execution_count": 10, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :29: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :49: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", "sess.run(tf.global_variables_initializer())" ] }, @@ -219,24 +238,89 @@ "metadata": {}, "outputs": [], "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 13872, 13872, 31767, 11095, 11095, 11095, 20258, 20258,\n", + " 20258, 20258, 6960, 11142, 20354, 20354, 20354, 20354, 31683,\n", + " 31683, 26948, 5179, 4466, 4466, 4466, 4466, 5385, 5385,\n", + " 5385, 18081, 18081, 18081, 7270, 14117, 27127, 29460, 19935,\n", + " 6996]],\n", + " \n", + " [[ 1, 21436, 9561, 9561, 7197, 9561, 7197, 5339, 26032,\n", + " 5339, 4247, 21722, 4247, 31038, 26507, 26507, 26507, 26507,\n", + " 26507, 15165, 15165, 15165, 22856, 22856, 24925, 24925, 17107,\n", + " 17107, 13494, 21013, 21013, 21013, 17107, 5741, 5741, 5741,\n", + " 29063]],\n", + " \n", + " [[ 1, 18442, 23238, 23238, 25687, 25687, 1263, 1263, 13867,\n", + " 13867, 21591, 21591, 21591, 13440, 13440, 14704, 14704, 16520,\n", + " 16520, 1165, 1165, 20886, 20886, 20886, 26485, 13691, 13691,\n", + " 13691, 13691, 17436, 17436, 11480, 11480, 11480, 24671, 19593,\n", + " 3945]],\n", + " \n", + " [[ 1, 6987, 6987, 15750, 13615, 13615, 13615, 13615, 13615,\n", + " 13615, 21760, 20616, 14788, 31935, 31935, 30042, 30042, 9703,\n", + " 10424, 10424, 10424, 10424, 10424, 10424, 29130, 29130, 17580,\n", + " 17580, 17580, 22712, 22712, 8363, 8363, 8363, 8363, 8363,\n", + " 28161]],\n", + " \n", + " [[ 1, 6904, 10292, 6904, 4559, 29435, 20541, 16804, 16804,\n", + " 15974, 15974, 24759, 24759, 25652, 25652, 19656, 26384, 26384,\n", + " 26384, 26384, 10083, 10083, 10083, 16539, 16539, 31625, 31625,\n", + " 24519, 24519, 17010, 3843, 3843, 3843, 3843, 31827, 31827,\n", + " 31827]],\n", + " \n", + " [[ 1, 24049, 3678, 3678, 3678, 29534, 29534, 29534, 29534,\n", + " 25344, 25344, 15610, 14812, 4991, 4991, 4991, 4991, 2925,\n", + " 2925, 3374, 3374, 3374, 15182, 9953, 9953, 9953, 9953,\n", + " 5040, 28844, 28844, 28844, 28844, 28844, 17223, 17223, 17223,\n", + " 17223]],\n", + " \n", + " [[ 1, 18918, 19900, 19900, 18515, 18515, 18515, 18515, 26826,\n", + " 26826, 20712, 20712, 20712, 2096, 2096, 20712, 22791, 22791,\n", + " 22791, 22791, 22791, 22791, 22791, 22791, 22571, 31728, 13564,\n", + " 13564, 13564, 13564, 13564, 13564, 13564, 10383, 10383, 10383,\n", + " 2889]],\n", + " \n", + " [[ 1, 19322, 26095, 26095, 26095, 13011, 20536, 6341, 6341,\n", + " 6341, 6341, 713, 713, 713, 18986, 18986, 12697, 12697,\n", + " 24567, 24567, 24567, 24567, 15730, 15730, 15730, 15730, 13663,\n", + " 13663, 13663, 13663, 1845, 22560, 22560, 31424, 30920, 22135,\n", + " 29185]],\n", + " \n", + " [[ 1, 4315, 4315, 4315, 30456, 30456, 30456, 30784, 30784,\n", + " 30784, 23330, 9414, 9414, 16503, 16503, 22887, 22887, 15914,\n", + " 27083, 27083, 27083, 27083, 27083, 27083, 27083, 172, 21519,\n", + " 21519, 21519, 21519, 23056, 23056, 17458, 17458, 17458, 29126,\n", + " 17458]],\n", + " \n", + " [[ 1, 272, 18498, 18498, 18498, 13936, 13936, 13936, 3903,\n", + " 3903, 3903, 4249, 4249, 25347, 3911, 3911, 12265, 29455,\n", + " 29455, 9967, 9967, 9967, 9967, 9967, 9967, 9967, 22833,\n", + " 22833, 22833, 22833, 23365, 23365, 21688, 2969, 2969, 18484,\n", + " 18484]]], dtype=int32), 10.373586, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" ] }, { @@ -245,21 +329,391 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:45<00:00, 2.67it/s, accuracy=0.142, cost=6.15]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 6.18it/s, accuracy=0.134, cost=6] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: my dad was always whistling around the house , and i just thought that 's part of communication in my family .\n", - "REAL ANSWER: bố tôi từng lúc nào cũng huýt gió khắp nơi trong nhà and tôi cứ ngỡ đó là một phần trong cách giao tiếp của gia đình tôi .\n", - "PREDICTED ANSWER: và tôi là là một một những những những , , , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 2\n", - "QUESTION: cinema is arguably the 20th century 's most influential art form .\n", - "REAL ANSWER: điện ảnh đáng được tranh cãi là dạng nghệ thuật ảnh hưởng nhất trong thế kỉ 20 .\n", - "PREDICTED ANSWER: và tôi là một những những những những của , , , , của . . . \n", - "\n", - "row 3\n", - "QUESTION: indeed , it is hard to find a subject that film has yet to tackle .\n", - "REAL ANSWER: thực sự là , thật khó để tìm một chủ đề mà điện ảnh chưa động đến .\n", - "PREDICTED ANSWER: và tôi , , , tôi , , , , , , , , , , . . \n", - "\n", - "row 4\n", - "QUESTION: i 'm so used to that .\n", - "REAL ANSWER: tôi đã quen với việc đó rồi .\n", - "PREDICTED ANSWER: và tôi là , tôi tôi tôi . \n", - "\n" - ] + "data": { + "text/plain": [ + "0.025584696" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/neural-machine-translation/19.basic-birnn-seq2seq-bahdanau.ipynb b/neural-machine-translation/19.basic-birnn-seq2seq-bahdanau.ipynb new file mode 100644 index 0000000..1445296 --- /dev/null +++ b/neural-machine-translation/19.basic-birnn-seq2seq-bahdanau.ipynb @@ -0,0 +1,728 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :48: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :58: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 6674, 4754, 29677, 16908, 7077, 26947, 14012, 2190,\n", + " 1348, 29404, 19050, 27073, 5499, 17089, 25249, 6914, 10891,\n", + " 12465, 2871, 6495, 4166, 10777, 17396, 29893, 9527, 31933,\n", + " 14953, 23418, 28066, 30951, 18395, 30777, 18023, 31551, 30066,\n", + " 12959]],\n", + " \n", + " [[ 1, 8133, 23779, 9457, 14795, 18712, 18387, 18146, 19728,\n", + " 22705, 9828, 25007, 10992, 6127, 30693, 21968, 16118, 26370,\n", + " 1342, 12065, 781, 9350, 24331, 4705, 15485, 2872, 3871,\n", + " 6781, 3293, 18598, 6949, 31833, 1614, 18881, 7732, 31311,\n", + " 1215]],\n", + " \n", + " [[ 1, 22060, 31250, 31981, 7229, 18514, 19460, 13304, 816,\n", + " 12177, 13348, 13149, 5113, 1399, 11426, 6210, 29101, 13486,\n", + " 7699, 4637, 7309, 19154, 10204, 29295, 22412, 2606, 5063,\n", + " 16812, 15476, 23486, 29684, 20674, 16515, 13231, 10828, 22798,\n", + " 29002]],\n", + " \n", + " [[ 1, 15666, 27933, 9481, 7068, 21647, 2459, 31488, 5850,\n", + " 22211, 21970, 13917, 14244, 5818, 960, 10191, 22117, 19237,\n", + " 26404, 8803, 12965, 13348, 13261, 6758, 9657, 11303, 24475,\n", + " 12307, 29832, 25557, 2328, 23337, 29506, 27297, 11446, 366,\n", + " 31105]],\n", + " \n", + " [[ 1, 21699, 8580, 20003, 4374, 18537, 609, 20273, 3353,\n", + " 5388, 12070, 25929, 28269, 13278, 19060, 21559, 30325, 8296,\n", + " 26429, 11679, 26679, 22624, 8136, 8695, 14619, 18946, 898,\n", + " 155, 21249, 25937, 28917, 12772, 7325, 6904, 747, 28572,\n", + " 12157]],\n", + " \n", + " [[ 1, 27561, 13102, 3738, 3010, 8235, 6668, 1755, 20717,\n", + " 24501, 21844, 29527, 10936, 15395, 10933, 16326, 6429, 23388,\n", + " 8499, 24538, 24472, 13548, 24293, 5676, 19067, 19429, 27698,\n", + " 8182, 11404, 16243, 12704, 29182, 13247, 6883, 19272, 3443,\n", + " 24458]],\n", + " \n", + " [[ 1, 26895, 9770, 10309, 30482, 26025, 11431, 18268, 13399,\n", + " 26283, 27303, 28610, 29817, 29428, 24535, 28623, 10650, 21388,\n", + " 433, 18351, 3053, 7825, 18578, 17826, 3888, 14700, 25281,\n", + " 30992, 24086, 22659, 24193, 28137, 24132, 4056, 21255, 19473,\n", + " 11517]],\n", + " \n", + " [[ 1, 7907, 18768, 13037, 31228, 12145, 14222, 27611, 15992,\n", + " 11387, 30998, 19236, 3149, 3979, 18949, 551, 10185, 22042,\n", + " 20442, 24427, 31664, 6294, 11756, 10719, 24469, 6268, 20828,\n", + " 30677, 27276, 95, 7591, 26937, 14849, 14214, 30544, 8306,\n", + " 14752]],\n", + " \n", + " [[ 1, 725, 30189, 7620, 23678, 6394, 9399, 30739, 10072,\n", + " 12704, 4324, 11138, 9754, 5515, 12374, 14955, 12482, 9084,\n", + " 25043, 19411, 20170, 3600, 26354, 1892, 12385, 10641, 5898,\n", + " 24458, 24906, 25812, 4413, 418, 20969, 26261, 24891, 3547,\n", + " 24046]],\n", + " \n", + " [[ 1, 14327, 8885, 9880, 10537, 28217, 7516, 30681, 24046,\n", + " 13968, 31357, 21085, 25311, 15804, 27510, 28390, 9817, 507,\n", + " 23574, 13167, 8505, 17464, 22029, 27931, 30769, 21887, 29798,\n", + " 14523, 18150, 31755, 7666, 10421, 16748, 25859, 5619, 31352,\n", + " 1716]]], dtype=int32), 10.372685, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:12<00:00, 2.83it/s, accuracy=0.124, cost=6.38]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 5.76it/s, accuracy=0.129, cost=6.04]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.00020161743" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/19.lstm-birnn-seq2seq-luong.ipynb b/neural-machine-translation/19.lstm-birnn-seq2seq-luong.ipynb deleted file mode 100644 index aae3a67..0000000 --- a/neural-machine-translation/19.lstm-birnn-seq2seq-luong.ipynb +++ /dev/null @@ -1,418 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def attention():\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = attention(),\n", - " cell_bw = attention(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw[0].c, state_bw[0].c), -1)\n", - " bi_state_h = tf.concat((state_fw[0].h, state_bw[0].h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " last_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.085053, avg accuracy: 0.882264\n", - "epoch: 2, avg loss: 0.814599, avg accuracy: 0.913018\n", - "epoch: 3, avg loss: 0.754955, avg accuracy: 0.912291\n", - "epoch: 4, avg loss: 0.726837, avg accuracy: 0.914618\n", - "epoch: 5, avg loss: 0.719420, avg accuracy: 0.914155\n", - "epoch: 6, avg loss: 0.710243, avg accuracy: 0.915036\n", - "epoch: 7, avg loss: 0.705885, avg accuracy: 0.914918\n", - "epoch: 8, avg loss: 0.700633, avg accuracy: 0.914964\n", - "epoch: 9, avg loss: 0.699920, avg accuracy: 0.914491\n", - "epoch: 10, avg loss: 0.698896, avg accuracy: 0.913927\n", - "epoch: 11, avg loss: 0.690312, avg accuracy: 0.914736\n", - "epoch: 12, avg loss: 0.682916, avg accuracy: 0.915309\n", - "epoch: 13, avg loss: 0.684799, avg accuracy: 0.914700\n", - "epoch: 14, avg loss: 0.673822, avg accuracy: 0.916218\n", - "epoch: 15, avg loss: 0.671391, avg accuracy: 0.916436\n", - "epoch: 16, avg loss: 0.665592, avg accuracy: 0.916800\n", - "epoch: 17, avg loss: 0.657978, avg accuracy: 0.917973\n", - "epoch: 18, avg loss: 0.658992, avg accuracy: 0.917709\n", - "epoch: 19, avg loss: 0.669110, avg accuracy: 0.916891\n", - "epoch: 20, avg loss: 0.649123, avg accuracy: 0.918555\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: we have to fly at a special incline in order to make the measurements .\n", - "REAL ANSWER: chúng tôi phải bay với độ nghiêng đặc biệt để thực hiện các phép đo .\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi , , , , , , , , , . . . \n", - "\n", - "row 2\n", - "QUESTION: and as you come around the banks in these valleys , the forces can get up to two gs .\n", - "REAL ANSWER: khi bay quanh những bờ sông ở thung lũng , các lực tác động có thể lên tới 2g .\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 3\n", - "QUESTION: hi . i 'm going to ask you to raise your arms and wave back , just the way i am -- kind of a royal wave .\n", - "REAL ANSWER: xin chào . tôi đề nghị các bạn giơ tay và vẫy về phía sau như tôi làm đây -- như cách vẫy hoàng gia .\n", - "PREDICTED ANSWER: và tôi là , , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 4\n", - "QUESTION: some more tough , like , really , what will be the impact on mario 's life ?\n", - "REAL ANSWER: một số băn khoăn nặng nề hơn , như là cuộc sống của mario sẽ bị ảnh hưởng thế nào ?\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , , , , , , . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/2.lstm-seq2seq-manual.ipynb b/neural-machine-translation/2.lstm-seq2seq-manual.ipynb deleted file mode 100644 index ef6c4df..0000000 --- a/neural-machine-translation/2.lstm-seq2seq-manual.ipynb +++ /dev/null @@ -1,390 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.715693, avg accuracy: 0.883155\n", - "epoch: 2, avg loss: 0.814847, avg accuracy: 0.911936\n", - "epoch: 3, avg loss: 0.766373, avg accuracy: 0.912682\n", - "epoch: 4, avg loss: 0.726060, avg accuracy: 0.913045\n", - "epoch: 5, avg loss: 0.715116, avg accuracy: 0.914455\n", - "epoch: 6, avg loss: 0.716877, avg accuracy: 0.913318\n", - "epoch: 7, avg loss: 0.711519, avg accuracy: 0.913527\n", - "epoch: 8, avg loss: 0.712105, avg accuracy: 0.913155\n", - "epoch: 9, avg loss: 0.700975, avg accuracy: 0.914818\n", - "epoch: 10, avg loss: 0.695844, avg accuracy: 0.914736\n", - "epoch: 11, avg loss: 0.704287, avg accuracy: 0.913691\n", - "epoch: 12, avg loss: 0.688946, avg accuracy: 0.915609\n", - "epoch: 13, avg loss: 0.687791, avg accuracy: 0.914855\n", - "epoch: 14, avg loss: 0.679813, avg accuracy: 0.915309\n", - "epoch: 15, avg loss: 0.665267, avg accuracy: 0.916300\n", - "epoch: 16, avg loss: 0.666998, avg accuracy: 0.916055\n", - "epoch: 17, avg loss: 0.662029, avg accuracy: 0.915691\n", - "epoch: 18, avg loss: 0.648469, avg accuracy: 0.917382\n", - "epoch: 19, avg loss: 0.663604, avg accuracy: 0.914791\n", - "epoch: 20, avg loss: 0.648736, avg accuracy: 0.917009\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: by the time the pilot had finished , we had the names of a thousand schools that wished to join .\n", - "REAL ANSWER: ngay khi thử nghiệm kết thúc , chúng ta đã có tên của hàng ngàn trường học mong muốn được tham gia .\n", - "PREDICTED ANSWER: và tôi , , , , , . , , . , , . . . , , , , , . \n", - "\n", - "row 2\n", - "QUESTION: i 'm here to show you how something you can 't see can be so much fun to look at .\n", - "REAL ANSWER: hôm nay tôi đến đây để chỉ cho các bạn những thứ bạn không thể thấy được nhưng rất thú vị khi nhìn vào đó\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , . . . , . , , , , . . \n", - "\n", - "row 3\n", - "QUESTION: and every one of those scientists is in a research group , and every research group studies a wide variety of topics .\n", - "REAL ANSWER: mỗi một khoa học gia đều thuộc một nhóm nghiên cứu , và mỗi nhóm đều nghiên cứu rất nhiều đề tài đa dạng .\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , , , , , , , , , . . . \n", - "\n", - "row 4\n", - "QUESTION: and as a body of knowledge builds up , it will form one subsection , or one sub-subsection of an assessment like the ipcc , although we have others .\n", - "REAL ANSWER: khi một phần kiến thức dần định hình , nó sẽ tạo thành một tiểu mục , hay một tiểu-tiểu mục trong một bản kiểm định như ở ipcc , mặc dù còn có nhiều bài khác .\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/2.lstm-seq2seq.ipynb b/neural-machine-translation/2.lstm-seq2seq.ipynb new file mode 100644 index 0000000..75d2b4e --- /dev/null +++ b/neural-machine-translation/2.lstm-seq2seq.ipynb @@ -0,0 +1,780 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :28: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :31: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :39: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 30208, 9466, 21054, 21054, 11991, 11991, 11991, 22891,\n", + " 22891, 24164, 24164, 24164, 24164, 20460, 20460, 3993, 3993,\n", + " 3993, 1103, 27324, 27324, 27324, 27324, 29798, 29798, 29798,\n", + " 29798, 29798, 24228, 17000, 31763, 17000, 18836, 18836, 27529,\n", + " 25128]],\n", + " \n", + " [[ 1, 7734, 12324, 12324, 15213, 15213, 23375, 23375, 2527,\n", + " 2527, 2527, 9949, 9721, 9949, 17717, 30276, 3281, 3281,\n", + " 3281, 3281, 30739, 18470, 18470, 13157, 13157, 19973, 19973,\n", + " 19973, 15052, 15052, 12855, 12855, 12855, 3296, 3296, 3296,\n", + " 10728]],\n", + " \n", + " [[ 1, 11021, 18548, 18548, 10432, 10432, 6823, 6823, 26107,\n", + " 26107, 23252, 15178, 23252, 22902, 16291, 16291, 16291, 27699,\n", + " 10473, 10473, 29507, 29507, 29507, 18835, 18835, 22324, 22947,\n", + " 22947, 22947, 22947, 20711, 20711, 20711, 20711, 22964, 26134,\n", + " 26134]],\n", + " \n", + " [[ 1, 26091, 13779, 25646, 14902, 13686, 14820, 14820, 15,\n", + " 15, 15, 2673, 2673, 14368, 14368, 23046, 31252, 31252,\n", + " 14951, 14951, 22092, 25842, 2491, 2491, 2491, 2491, 2491,\n", + " 9169, 9169, 9169, 9169, 9169, 8789, 8789, 8789, 24681,\n", + " 24681]],\n", + " \n", + " [[ 1, 1706, 1706, 28228, 8811, 8341, 24990, 24990, 1820,\n", + " 1820, 17451, 17451, 31676, 31676, 31676, 5158, 17451, 5158,\n", + " 11627, 11627, 11627, 11627, 12141, 12141, 6660, 6660, 11187,\n", + " 11187, 4193, 13957, 13957, 18767, 18767, 18767, 6037, 6037,\n", + " 31524]],\n", + " \n", + " [[ 1, 1557, 1557, 21575, 21575, 21575, 21575, 21575, 21575,\n", + " 21575, 16826, 16826, 12498, 3693, 30178, 30178, 30178, 30178,\n", + " 17152, 16385, 14577, 16385, 19040, 13078, 20407, 20407, 20407,\n", + " 13078, 13078, 15969, 15969, 14450, 3015, 3015, 3015, 21675,\n", + " 21675]],\n", + " \n", + " [[ 1, 31527, 31527, 23124, 31573, 31573, 31573, 14700, 14700,\n", + " 24854, 21518, 3275, 15957, 15957, 11109, 11109, 11109, 1877,\n", + " 15556, 2863, 26404, 26404, 26404, 7297, 6343, 6343, 6343,\n", + " 490, 98, 490, 12330, 4901, 4901, 15420, 15420, 13670,\n", + " 13670]],\n", + " \n", + " [[ 1, 23259, 20252, 8506, 7579, 7579, 29914, 29914, 21674,\n", + " 25842, 18450, 22808, 22808, 22808, 14871, 12946, 12946, 14506,\n", + " 17944, 17944, 3041, 21006, 9297, 9297, 9297, 19615, 19615,\n", + " 8856, 6664, 15492, 15492, 7244, 7244, 7244, 7244, 7129,\n", + " 7129]],\n", + " \n", + " [[ 1, 18645, 15910, 19592, 20749, 21493, 21493, 21493, 7087,\n", + " 7087, 7087, 7087, 7087, 12932, 12932, 12932, 12932, 12932,\n", + " 12932, 12932, 12932, 12932, 25064, 25064, 25064, 25064, 23969,\n", + " 23969, 24844, 24844, 16812, 16812, 22829, 22829, 17967, 17967,\n", + " 17967]],\n", + " \n", + " [[ 1, 13741, 8114, 8114, 3362, 3362, 22631, 22631, 22631,\n", + " 16042, 16042, 16042, 26676, 26676, 26676, 18254, 18254, 1540,\n", + " 1540, 1540, 1540, 5762, 5762, 5762, 28019, 5762, 5762,\n", + " 5762, 2367, 2367, 7918, 7918, 7918, 7918, 24939, 25219,\n", + " 25219]]], dtype=int32), 10.373699, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [07:15<00:00, 3.59it/s, accuracy=0.112, cost=6.86]\n", + "minibatch loop: 100%|██████████| 40/40 [00:04<00:00, 8.27it/s, accuracy=0.113, cost=6.68]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/20.gru-birnn-seq2seq-luong.ipynb b/neural-machine-translation/20.gru-birnn-seq2seq-luong.ipynb deleted file mode 100644 index 5e77424..0000000 --- a/neural-machine-translation/20.gru-birnn-seq2seq-luong.ipynb +++ /dev/null @@ -1,408 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def attention():\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = attention(),\n", - " cell_bw = attention(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw[0],state_bw[0]), -1)\n", - " last_state = tuple([bi_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.487517, avg accuracy: 0.884773\n", - "epoch: 2, avg loss: 0.750528, avg accuracy: 0.912964\n", - "epoch: 3, avg loss: 0.720992, avg accuracy: 0.914800\n", - "epoch: 4, avg loss: 0.722265, avg accuracy: 0.913673\n", - "epoch: 5, avg loss: 0.705631, avg accuracy: 0.915891\n", - "epoch: 6, avg loss: 0.705415, avg accuracy: 0.914773\n", - "epoch: 7, avg loss: 0.701500, avg accuracy: 0.915036\n", - "epoch: 8, avg loss: 0.693655, avg accuracy: 0.915027\n", - "epoch: 9, avg loss: 0.679046, avg accuracy: 0.916273\n", - "epoch: 10, avg loss: 0.671368, avg accuracy: 0.916427\n", - "epoch: 11, avg loss: 0.662505, avg accuracy: 0.917173\n", - "epoch: 12, avg loss: 0.652077, avg accuracy: 0.918118\n", - "epoch: 13, avg loss: 0.649267, avg accuracy: 0.917636\n", - "epoch: 14, avg loss: 0.647626, avg accuracy: 0.916918\n", - "epoch: 15, avg loss: 0.635272, avg accuracy: 0.918818\n", - "epoch: 16, avg loss: 0.627029, avg accuracy: 0.919073\n", - "epoch: 17, avg loss: 0.627457, avg accuracy: 0.917945\n", - "epoch: 18, avg loss: 0.613821, avg accuracy: 0.919591\n", - "epoch: 19, avg loss: 0.606262, avg accuracy: 0.920145\n", - "epoch: 20, avg loss: 0.606568, avg accuracy: 0.919445\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: but we want to share what was the key learning , the key learning that mario drove to us , and it is to consider what you have as a gift and not only what you miss , and to consider what you miss just as an opportunity .\n", - "REAL ANSWER: nhưng chúng tôi muốn chia sẻ nhận thức cốt yếu , nhận thức cốt yếu mà mario đã đưa chúng tôi tới , rằng hãy coi những gì bạn có một như món quà chứ không phải là những gì bạn đã bỏ lỡ , và coi những gì bạn đã bỏ lỡ chỉ như một cơ hội .\n", - "PREDICTED ANSWER: chúng có có có thể thể một những những của của , , , , , , , có có , , , , , , một thể có thể , , và và một một và và , , , , , , , thể , , , thể thể . \n", - "\n", - "row 2\n", - "QUESTION: there have been three ways to try to impact the brain : the therapist 's couch , pills and the knife .\n", - "REAL ANSWER: có ba cách để làm ảnh hưởng đến não : giường của nhà trị liệu học , thuốc viên và con dao .\n", - "PREDICTED ANSWER: chúng một có có , , của của của của của của của của của của của , , và , và . . \n", - "\n", - "row 3\n", - "QUESTION: we blow it up and look at the pieces .\n", - "REAL ANSWER: chúng tôi cho nó nổ và xem xét từng mảnh nhỏ .\n", - "PREDICTED ANSWER: chúng tôi tôi , , một của của của của . . \n", - "\n", - "row 4\n", - "QUESTION: we honor reading , why not honor watching with the same passion ?\n", - "REAL ANSWER: chúng ta xem trọng việc đọc sách , tại sao không xem trọng việc xem phim với niềm đam mê ?\n", - "PREDICTED ANSWER: chúng tôi tôi một , , , , , những những của của của và của của . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/20.lstm-birnn-seq2seq-bahdanau.ipynb b/neural-machine-translation/20.lstm-birnn-seq2seq-bahdanau.ipynb new file mode 100644 index 0000000..3f9724e --- /dev/null +++ b/neural-machine-translation/20.lstm-birnn-seq2seq-bahdanau.ipynb @@ -0,0 +1,757 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " last_state = tuple([bi_lstm_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :50: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :60: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 29677, 30133, 30133, 15849, 15849, 17990, 26783, 9296,\n", + " 17623, 17623, 30450, 30450, 30450, 2668, 2668, 2668, 2668,\n", + " 9979, 9979, 10817, 25898, 25898, 2415, 31880, 18830, 18830,\n", + " 31968, 31860, 31860, 31779, 31779, 31779, 31779, 31779, 30889,\n", + " 10343]],\n", + " \n", + " [[ 1, 28625, 2824, 1555, 1555, 567, 4665, 4665, 4665,\n", + " 4159, 25073, 22365, 22365, 25073, 25073, 22365, 22365, 22365,\n", + " 1512, 1512, 22365, 22365, 526, 526, 22365, 22365, 526,\n", + " 22365, 8708, 8708, 8708, 8708, 8708, 14406, 14406, 14406,\n", + " 25357]],\n", + " \n", + " [[ 1, 30796, 8794, 15219, 9681, 26029, 26029, 26029, 26029,\n", + " 17587, 17587, 17587, 17587, 10464, 10464, 10464, 21543, 16846,\n", + " 16846, 12605, 12605, 4147, 8431, 19384, 14059, 14059, 1435,\n", + " 22956, 22956, 12922, 17927, 17927, 21663, 21663, 10535, 27358,\n", + " 27358]],\n", + " \n", + " [[ 1, 31714, 31714, 28571, 28571, 6850, 17524, 17524, 28381,\n", + " 28381, 28381, 8733, 22211, 22211, 7192, 7192, 11377, 11377,\n", + " 30897, 30897, 11390, 11390, 11390, 24835, 18748, 18748, 4568,\n", + " 24835, 22142, 893, 22142, 68, 19518, 19518, 26179, 11528,\n", + " 20166]],\n", + " \n", + " [[ 1, 13466, 26408, 18892, 22007, 27052, 23235, 4069, 4069,\n", + " 29585, 22766, 22766, 13147, 13147, 13147, 27424, 27424, 21474,\n", + " 21474, 2656, 2656, 2656, 29531, 1177, 1177, 22270, 3658,\n", + " 29097, 29097, 7023, 8830, 7702, 7702, 14232, 14232, 10014,\n", + " 10014]],\n", + " \n", + " [[ 1, 20497, 23054, 23054, 5279, 15070, 15070, 18520, 18883,\n", + " 12342, 12342, 27659, 15949, 15949, 15949, 18654, 18654, 18654,\n", + " 12661, 12661, 12661, 12661, 19096, 19096, 30439, 30439, 30439,\n", + " 31804, 31804, 31804, 8403, 25440, 25440, 14680, 27067, 27067,\n", + " 27067]],\n", + " \n", + " [[ 1, 19684, 22355, 22355, 26875, 26875, 26875, 26875, 26875,\n", + " 27079, 27079, 27079, 30184, 16978, 16978, 16978, 10372, 10372,\n", + " 10372, 10372, 9004, 9004, 9004, 9004, 9004, 15619, 15712,\n", + " 15712, 15712, 15712, 15712, 31474, 30561, 30561, 30561, 23194,\n", + " 10844]],\n", + " \n", + " [[ 1, 5385, 29135, 29135, 4818, 15122, 15122, 18116, 18116,\n", + " 14234, 17984, 17984, 17984, 17984, 18441, 18441, 1092, 3161,\n", + " 10841, 4278, 4278, 26077, 26077, 26077, 26878, 1920, 5682,\n", + " 5682, 5682, 5682, 5971, 2937, 2937, 2937, 28584, 3637,\n", + " 30212]],\n", + " \n", + " [[ 1, 26365, 26365, 26365, 8293, 7628, 7628, 10133, 7980,\n", + " 7980, 7980, 26537, 627, 627, 627, 627, 627, 13805,\n", + " 877, 877, 877, 877, 877, 877, 30117, 9406, 9406,\n", + " 9406, 9406, 9406, 9406, 9406, 26735, 26735, 2897, 2897,\n", + " 2897]],\n", + " \n", + " [[ 1, 28441, 4381, 4381, 4381, 4381, 21041, 21041, 21041,\n", + " 21041, 21041, 21041, 11387, 11387, 11387, 14820, 14820, 14820,\n", + " 18413, 18413, 18413, 18413, 18413, 2800, 29814, 15107, 15107,\n", + " 15107, 26337, 17125, 26337, 26337, 17125, 1885, 1885, 2155,\n", + " 2155]]], dtype=int32), 10.374287, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [12:11<00:00, 2.14it/s, accuracy=0.126, cost=6.27]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.91it/s, accuracy=0.167, cost=6.07]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.054097746" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/21.gru-birnn-seq2seq-bahdanau.ipynb b/neural-machine-translation/21.gru-birnn-seq2seq-bahdanau.ipynb new file mode 100644 index 0000000..54d5dd2 --- /dev/null +++ b/neural-machine-translation/21.gru-birnn-seq2seq-bahdanau.ipynb @@ -0,0 +1,741 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :48: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :58: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 18052, 18052, 19804, 19804, 19804, 4313, 6055, 6155,\n", + " 9694, 9694, 11207, 11207, 347, 1959, 1959, 1959, 8028,\n", + " 8028, 8028, 8028, 15119, 15119, 15119, 3252, 3252, 3252,\n", + " 3252, 9799, 9799, 9799, 26138, 20918, 20918, 20918, 22175,\n", + " 26691]],\n", + " \n", + " [[ 1, 25834, 25834, 20262, 20262, 26498, 272, 272, 31970,\n", + " 272, 272, 272, 272, 16273, 16273, 16273, 15479, 15479,\n", + " 15479, 9500, 1841, 1841, 3809, 3809, 3809, 3809, 9120,\n", + " 14024, 14024, 14024, 14024, 14024, 14024, 10565, 10565, 10565,\n", + " 10565]],\n", + " \n", + " [[ 1, 234, 234, 31195, 1235, 1235, 14508, 14508, 14508,\n", + " 14508, 24206, 24206, 27747, 21990, 21990, 21448, 31056, 29858,\n", + " 17227, 17227, 29858, 19343, 15649, 15649, 9389, 9389, 16711,\n", + " 16711, 20530, 20530, 20530, 28698, 28698, 3466, 3466, 3466,\n", + " 3466]],\n", + " \n", + " [[ 1, 24476, 5076, 17148, 15651, 15651, 19435, 20863, 20863,\n", + " 14471, 14471, 14471, 14471, 22799, 22799, 19957, 19957, 19957,\n", + " 4242, 4242, 19957, 19957, 19957, 4242, 19957, 4242, 19957,\n", + " 19957, 4242, 20035, 20035, 20035, 20035, 20035, 20035, 17816,\n", + " 17816]],\n", + " \n", + " [[ 1, 15449, 19936, 30733, 27969, 8262, 24545, 13521, 13521,\n", + " 30860, 28321, 9316, 9316, 7601, 7601, 7601, 3010, 31887,\n", + " 31887, 25504, 25504, 25504, 28998, 28998, 29480, 29480, 29480,\n", + " 29480, 22594, 22594, 10129, 31283, 31283, 17379, 17379, 11573,\n", + " 11573]],\n", + " \n", + " [[ 1, 2187, 3420, 18734, 17045, 17045, 11537, 7711, 18094,\n", + " 14826, 15414, 15414, 27121, 27378, 27378, 27378, 4037, 13301,\n", + " 13301, 29573, 21329, 21329, 29573, 29573, 9041, 9041, 9041,\n", + " 6408, 6408, 6408, 14058, 14058, 14058, 14058, 14058, 14058,\n", + " 7533]],\n", + " \n", + " [[ 1, 23277, 23277, 7019, 7019, 21542, 7086, 30131, 22524,\n", + " 22524, 16353, 16353, 4717, 4717, 4582, 31085, 16722, 31085,\n", + " 31085, 31085, 22853, 22853, 22853, 22853, 4518, 4518, 22163,\n", + " 9763, 11970, 19872, 13119, 1534, 7538, 10112, 10112, 7888,\n", + " 14240]],\n", + " \n", + " [[ 1, 24490, 24490, 16276, 16276, 16276, 16276, 14250, 4446,\n", + " 4446, 4446, 21033, 8026, 8026, 21033, 26065, 26065, 26065,\n", + " 11730, 11730, 15458, 15458, 30686, 30686, 30686, 12996, 12996,\n", + " 12996, 12996, 11779, 11779, 11779, 11779, 11779, 7451, 7451,\n", + " 7451]],\n", + " \n", + " [[ 1, 181, 181, 181, 181, 181, 181, 181, 181,\n", + " 21256, 21256, 21256, 181, 181, 181, 181, 181, 181,\n", + " 21256, 21256, 181, 22665, 22665, 22665, 6179, 6179, 6179,\n", + " 6179, 6179, 6179, 5450, 1589, 1589, 1589, 21556, 21556,\n", + " 21556]],\n", + " \n", + " [[ 1, 9035, 9035, 9035, 9035, 8442, 3818, 3818, 3818,\n", + " 3818, 20738, 14514, 20738, 14514, 22783, 22783, 22783, 8672,\n", + " 8672, 12343, 12343, 12343, 12343, 22865, 21539, 21539, 17755,\n", + " 25154, 25154, 25154, 25154, 25023, 25023, 25023, 25023, 25023,\n", + " 25736]]], dtype=int32), 10.373411, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [12:31<00:00, 2.08it/s, accuracy=0.125, cost=6.2] \n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.76it/s, accuracy=0.113, cost=6.02]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.00020161743" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/21.lstm-birnn-seq2seq-bahdanau.ipynb b/neural-machine-translation/21.lstm-birnn-seq2seq-bahdanau.ipynb deleted file mode 100644 index f3b446d..0000000 --- a/neural-machine-translation/21.lstm-birnn-seq2seq-bahdanau.ipynb +++ /dev/null @@ -1,418 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def attention():\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = attention(),\n", - " cell_bw = attention(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw[0].c, state_bw[0].c), -1)\n", - " bi_state_h = tf.concat((state_fw[0].h, state_bw[0].h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " last_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.178726, avg accuracy: 0.880464\n", - "epoch: 2, avg loss: 0.830969, avg accuracy: 0.911327\n", - "epoch: 3, avg loss: 0.755527, avg accuracy: 0.912236\n", - "epoch: 4, avg loss: 0.737258, avg accuracy: 0.911273\n", - "epoch: 5, avg loss: 0.737195, avg accuracy: 0.911600\n", - "epoch: 6, avg loss: 0.717913, avg accuracy: 0.914209\n", - "epoch: 7, avg loss: 0.713073, avg accuracy: 0.914073\n", - "epoch: 8, avg loss: 0.711393, avg accuracy: 0.913709\n", - "epoch: 9, avg loss: 0.693806, avg accuracy: 0.915618\n", - "epoch: 10, avg loss: 0.693107, avg accuracy: 0.915264\n", - "epoch: 11, avg loss: 0.688313, avg accuracy: 0.915145\n", - "epoch: 12, avg loss: 0.690973, avg accuracy: 0.914191\n", - "epoch: 13, avg loss: 0.673017, avg accuracy: 0.916609\n", - "epoch: 14, avg loss: 0.667826, avg accuracy: 0.916745\n", - "epoch: 15, avg loss: 0.664243, avg accuracy: 0.917191\n", - "epoch: 16, avg loss: 0.656463, avg accuracy: 0.918100\n", - "epoch: 17, avg loss: 0.663125, avg accuracy: 0.917100\n", - "epoch: 18, avg loss: 0.652407, avg accuracy: 0.918582\n", - "epoch: 19, avg loss: 0.656035, avg accuracy: 0.917273\n", - "epoch: 20, avg loss: 0.654984, avg accuracy: 0.917655\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: and here 's this really sophisticated technology coming down the road , all these associated social , moral , ethical questions , and we scientists are just lousy at explaining to the public just exactly what it is we 're doing in those labs .\n", - "REAL ANSWER: đây là một công nghệ tinh vi đi cùng với nó là những câu hỏi về mặt xã hội , đạo đức và đạo lý , và các nhà khoa học thì rất dở trong việc giải thích với công chúng một cách chính xác họ đang làm gì trong phòng thí nghiệm .\n", - "PREDICTED ANSWER: và chúng , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , \n", - "\n", - "row 2\n", - "QUESTION: there were a lot of risks involved that they talked about during the informed consent portion .\n", - "REAL ANSWER: có rất nhiều nguy cơ liên quan mà họ nói đến trong phần thông báo sự chấp thuận .\n", - "PREDICTED ANSWER: và tôi tôi tôi , , , , , , , , , , , , . . . \n", - "\n", - "row 3\n", - "QUESTION: and all of those pages were reviewed by another 400-plus scientists and reviewers , from 113 countries .\n", - "REAL ANSWER: và tất cả các trang đều được xem xét bởi 400 khoa học gia và nhà phê bình khác từ 113 quốc gia .\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , , , , , , , . . . \n", - "\n", - "row 4\n", - "QUESTION: looked like a ladybug , right ?\n", - "REAL ANSWER: trông như một con bọ hung nhỉ ?\n", - "PREDICTED ANSWER: tôi tôi tôi tôi tôi . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/22.basic-birnn-seq2seq-luong.ipynb b/neural-machine-translation/22.basic-birnn-seq2seq-luong.ipynb new file mode 100644 index 0000000..e15c4c2 --- /dev/null +++ b/neural-machine-translation/22.basic-birnn-seq2seq-luong.ipynb @@ -0,0 +1,726 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :48: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :58: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 13691, 10105, 5362, 983, 3946, 29907, 9611, 22279,\n", + " 3771, 3324, 4185, 31432, 21555, 20708, 17033, 3745, 26829,\n", + " 10066, 15098, 14858, 621, 215, 29544, 29168, 7308, 13059,\n", + " 16952, 5473, 21937, 22382, 16462, 18776, 650, 23121, 14648,\n", + " 14810]],\n", + " \n", + " [[ 1, 17706, 29991, 5990, 2758, 4079, 8629, 15508, 29278,\n", + " 27586, 8554, 19925, 3941, 8639, 1686, 22819, 25298, 24542,\n", + " 14883, 11819, 14920, 14716, 28183, 26199, 23804, 21166, 12619,\n", + " 30366, 12710, 4325, 23569, 15348, 13164, 19060, 520, 17250,\n", + " 8971]],\n", + " \n", + " [[ 1, 31993, 20052, 17417, 12904, 24283, 5997, 24073, 9480,\n", + " 26793, 15535, 20869, 31067, 6052, 2171, 5045, 27366, 16694,\n", + " 14025, 4939, 1785, 5537, 2006, 12771, 2538, 25556, 7171,\n", + " 17865, 21301, 12486, 9620, 12676, 8931, 27596, 4073, 12117,\n", + " 12476]],\n", + " \n", + " [[ 1, 29716, 20042, 1588, 14416, 21446, 22338, 31658, 12499,\n", + " 20646, 5204, 1151, 18045, 18081, 10313, 1630, 14909, 3568,\n", + " 7451, 23252, 21007, 26028, 8571, 7445, 13694, 19286, 7639,\n", + " 4396, 21046, 7618, 10166, 3077, 8096, 7997, 11869, 25464,\n", + " 12918]],\n", + " \n", + " [[ 1, 14617, 12648, 27468, 29927, 23146, 17490, 31696, 26153,\n", + " 12212, 4165, 13246, 16011, 31024, 24874, 18846, 14061, 23114,\n", + " 27491, 23818, 28545, 10939, 27617, 17753, 13603, 27065, 19437,\n", + " 20216, 10296, 1068, 25988, 29975, 3774, 24402, 26595, 30516,\n", + " 9839]],\n", + " \n", + " [[ 1, 18639, 12036, 4321, 23662, 12292, 30149, 24705, 17850,\n", + " 6215, 10209, 29415, 2144, 9747, 29905, 1742, 12690, 31740,\n", + " 15449, 868, 27032, 24923, 1458, 28983, 14187, 5307, 24485,\n", + " 19363, 7439, 27140, 30311, 5465, 4276, 10435, 28339, 14630,\n", + " 25272]],\n", + " \n", + " [[ 1, 25544, 18217, 20029, 31387, 2378, 5696, 3497, 10020,\n", + " 4363, 25635, 31929, 28495, 174, 11684, 8211, 30545, 1561,\n", + " 13162, 445, 22563, 445, 7080, 30307, 27251, 28243, 9488,\n", + " 28869, 25327, 4187, 20405, 25612, 28309, 30277, 20192, 21356,\n", + " 1042]],\n", + " \n", + " [[ 1, 20179, 23662, 13301, 31282, 28974, 22904, 31906, 30450,\n", + " 16588, 12756, 12850, 26216, 14393, 22707, 24026, 22724, 12522,\n", + " 21864, 31545, 15872, 627, 6154, 19357, 27669, 13747, 16395,\n", + " 31069, 10452, 14126, 14184, 4631, 15877, 2348, 22246, 10136,\n", + " 23479]],\n", + " \n", + " [[ 1, 13428, 26784, 25339, 10187, 6993, 21760, 10375, 6117,\n", + " 15928, 7949, 15261, 21787, 11326, 9137, 31533, 353, 6609,\n", + " 15448, 18182, 31719, 23944, 21769, 1921, 12961, 23988, 11186,\n", + " 15126, 20304, 7158, 23720, 30617, 958, 16360, 3924, 29550,\n", + " 8421]],\n", + " \n", + " [[ 1, 16442, 3726, 25823, 11984, 30442, 22157, 21946, 28248,\n", + " 11764, 15524, 2117, 23203, 12350, 4962, 25314, 14136, 20139,\n", + " 10241, 13731, 27246, 14962, 2007, 12309, 11971, 20443, 27897,\n", + " 11266, 23606, 5813, 31769, 19284, 18263, 23084, 12150, 12011,\n", + " 1658]]], dtype=int32), 10.378456, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [08:58<00:00, 2.90it/s, accuracy=0.0848, cost=7.25]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 5.77it/s, accuracy=0.0968, cost=7.09]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/22.gru-birnn-seq2seq-bahdanau.ipynb b/neural-machine-translation/22.gru-birnn-seq2seq-bahdanau.ipynb deleted file mode 100644 index 06d1177..0000000 --- a/neural-machine-translation/22.gru-birnn-seq2seq-bahdanau.ipynb +++ /dev/null @@ -1,408 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def attention():\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = attention(),\n", - " cell_bw = attention(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw[0],state_bw[0]), -1)\n", - " last_state = tuple([bi_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.477565, avg accuracy: 0.883100\n", - "epoch: 2, avg loss: 0.768198, avg accuracy: 0.909345\n", - "epoch: 3, avg loss: 0.730443, avg accuracy: 0.913164\n", - "epoch: 4, avg loss: 0.732762, avg accuracy: 0.912455\n", - "epoch: 5, avg loss: 0.712641, avg accuracy: 0.914964\n", - "epoch: 6, avg loss: 0.711924, avg accuracy: 0.914645\n", - "epoch: 7, avg loss: 0.714578, avg accuracy: 0.914364\n", - "epoch: 8, avg loss: 0.687978, avg accuracy: 0.916218\n", - "epoch: 9, avg loss: 0.688235, avg accuracy: 0.915273\n", - "epoch: 10, avg loss: 0.678953, avg accuracy: 0.916364\n", - "epoch: 11, avg loss: 0.668476, avg accuracy: 0.916291\n", - "epoch: 12, avg loss: 0.665428, avg accuracy: 0.916345\n", - "epoch: 13, avg loss: 0.662611, avg accuracy: 0.915609\n", - "epoch: 14, avg loss: 0.644317, avg accuracy: 0.917982\n", - "epoch: 15, avg loss: 0.643702, avg accuracy: 0.917409\n", - "epoch: 16, avg loss: 0.627981, avg accuracy: 0.919155\n", - "epoch: 17, avg loss: 0.627175, avg accuracy: 0.918118\n", - "epoch: 18, avg loss: 0.614913, avg accuracy: 0.919591\n", - "epoch: 19, avg loss: 0.608737, avg accuracy: 0.920091\n", - "epoch: 20, avg loss: 0.603615, avg accuracy: 0.920555\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: as you may imagine , unfortunately , we were not ready .\n", - "REAL ANSWER: như bạn tưởng tượng , không may là chúng tôi chưa sẵn sàng .\n", - "PREDICTED ANSWER: chúng là tôi tôi , , , , , , tôi . . \n", - "\n", - "row 2\n", - "QUESTION: now , you might be asking yourself , " well , you know , what would i do in a biolab ? "\n", - "REAL ANSWER: bây giờ , có thể bạn đang tự hỏi , " bạn biết không , tôi sẽ làm gì trong một phòng thí nghiệm sinh học ? "\n", - "PREDICTED ANSWER: chúng là , , tôi tôi tôi , , , , , , , , , , tôi tôi tôi , , chúng , . . . \n", - "\n", - "row 3\n", - "QUESTION: by the time the pilot had finished , we had the names of a thousand schools that wished to join .\n", - "REAL ANSWER: ngay khi thử nghiệm kết thúc , chúng ta đã có tên của hàng ngàn trường học mong muốn được tham gia .\n", - "PREDICTED ANSWER: và là , tôi là của tôi tôi , , tôi , , , một , , , , , , . . \n", - "\n", - "row 4\n", - "QUESTION: geert chatrou : thank you . thank you .\n", - "REAL ANSWER: geert chatrou : cám ơn . cám ơn .\n", - "PREDICTED ANSWER: và tôi là tôi tôi . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/23.lstm-birnn-seq2seq-bahdanau-luong.ipynb b/neural-machine-translation/23.lstm-birnn-seq2seq-bahdanau-luong.ipynb deleted file mode 100644 index ff841f8..0000000 --- a/neural-machine-translation/23.lstm-birnn-seq2seq-bahdanau-luong.ipynb +++ /dev/null @@ -1,426 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def bahdanau():\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - " \n", - " def luong():\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(),\n", - " cell_bw = luong(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw[0].c, state_bw[0].c), -1)\n", - " bi_state_h = tf.concat((state_fw[0].h, state_bw[0].h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " last_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.010806, avg accuracy: 0.882545\n", - "epoch: 2, avg loss: 0.834663, avg accuracy: 0.910100\n", - "epoch: 3, avg loss: 0.763448, avg accuracy: 0.910609\n", - "epoch: 4, avg loss: 0.723663, avg accuracy: 0.913173\n", - "epoch: 5, avg loss: 0.717577, avg accuracy: 0.914864\n", - "epoch: 6, avg loss: 0.713918, avg accuracy: 0.914709\n", - "epoch: 7, avg loss: 0.724630, avg accuracy: 0.912364\n", - "epoch: 8, avg loss: 0.707776, avg accuracy: 0.914518\n", - "epoch: 9, avg loss: 0.695370, avg accuracy: 0.915718\n", - "epoch: 10, avg loss: 0.698285, avg accuracy: 0.914591\n", - "epoch: 11, avg loss: 0.683369, avg accuracy: 0.916464\n", - "epoch: 12, avg loss: 0.678766, avg accuracy: 0.915991\n", - "epoch: 13, avg loss: 0.673776, avg accuracy: 0.916345\n", - "epoch: 14, avg loss: 0.665838, avg accuracy: 0.917464\n", - "epoch: 15, avg loss: 0.663693, avg accuracy: 0.917427\n", - "epoch: 16, avg loss: 0.666812, avg accuracy: 0.917109\n", - "epoch: 17, avg loss: 0.659277, avg accuracy: 0.917427\n", - "epoch: 18, avg loss: 0.647385, avg accuracy: 0.919236\n", - "epoch: 19, avg loss: 0.653992, avg accuracy: 0.917982\n", - "epoch: 20, avg loss: 0.650577, avg accuracy: 0.918182\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: this is such a new area , and as we say back in brooklyn , you ain 't seen nothin ' yet .\n", - "REAL ANSWER: đây là một lĩnh vực rất mới , và như chúng tôi nói ở brooklyn , bạn còn chưa thấy gì cả đâu . .\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 2\n", - "QUESTION: and it has billions of interstitial spaces , and those spaces , along with the nanoparticles , reach up and grab the air molecules , and cover the surface with air .\n", - "REAL ANSWER: và có hàng tỉ khe hở giữa chúng , những kẽ hở này , và những phân tử nano chiếm lấy những phân tử không khí và bao phủ lớp ngoài bởi không khí .\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , , , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 3\n", - "QUESTION: this is the euphore smog chamber in spain .\n", - "REAL ANSWER: đây là phòng nghiên cứu khói bụi euphore ở tây ban nha .\n", - "PREDICTED ANSWER: chúng tôi tôi tôi , , , , , , . . \n", - "\n", - "row 4\n", - "QUESTION: what is a chubby , curly-haired guy from holland -- why is he whistling ?\n", - "REAL ANSWER: người đàn ông tròn trịa , tóc xoăn đến từ hà lan này là ai -- tại sao ông ấy lại huýt sáo ?\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , , , , , , . . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/23.lstm-birnn-seq2seq-luong.ipynb b/neural-machine-translation/23.lstm-birnn-seq2seq-luong.ipynb new file mode 100644 index 0000000..c3911ec --- /dev/null +++ b/neural-machine-translation/23.lstm-birnn-seq2seq-luong.ipynb @@ -0,0 +1,819 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " last_state = tuple([bi_lstm_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :50: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :60: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 27771, 27771, 27771, 13502, 13502, 30219, 30219, 28610,\n", + " 17336, 17336, 7755, 29745, 4459, 4459, 4459, 4459, 25779,\n", + " 25779, 2263, 10845, 10845, 10680, 27817, 27817, 21931, 26562,\n", + " 26562, 27719, 27719, 27719, 27479, 27479, 27479, 21668, 31759,\n", + " 31759]],\n", + " \n", + " [[ 1, 31388, 10962, 13242, 11317, 11317, 11317, 11320, 11320,\n", + " 15165, 16966, 16966, 5524, 8865, 8865, 19347, 30700, 30700,\n", + " 6930, 30700, 30700, 30700, 6118, 6749, 23158, 23158, 23158,\n", + " 22782, 22782, 12943, 15284, 15284, 11040, 11849, 11849, 398,\n", + " 398]],\n", + " \n", + " [[ 1, 29379, 29379, 8504, 24196, 6459, 6459, 6459, 4381,\n", + " 4381, 12318, 12318, 24130, 24130, 24130, 17680, 13838, 13838,\n", + " 7512, 28951, 28951, 8979, 8979, 8979, 14612, 14612, 11896,\n", + " 11896, 11896, 11425, 11425, 27536, 27536, 13572, 30639, 30639,\n", + " 31777]],\n", + " \n", + " [[ 1, 23575, 3828, 23575, 27489, 27489, 17036, 8848, 28728,\n", + " 2015, 14827, 2499, 9591, 2499, 9591, 2203, 14518, 2203,\n", + " 18316, 28399, 21188, 21188, 21188, 8308, 8308, 8772, 8772,\n", + " 8772, 10045, 25825, 12010, 12010, 12010, 17731, 17731, 12010,\n", + " 30507]],\n", + " \n", + " [[ 1, 30853, 31178, 4576, 4576, 18256, 19260, 20354, 8851,\n", + " 8851, 20609, 20609, 20609, 20609, 24717, 24717, 24717, 24717,\n", + " 18291, 154, 16987, 154, 154, 16987, 7547, 15925, 18382,\n", + " 27810, 27810, 26893, 11768, 11768, 24770, 18279, 18279, 18279,\n", + " 20619]],\n", + " \n", + " [[ 1, 17901, 24351, 24351, 7508, 25815, 25815, 23409, 23409,\n", + " 23409, 16821, 6340, 6340, 6340, 983, 983, 7609, 7609,\n", + " 21496, 10266, 10266, 29705, 23822, 4694, 23016, 23016, 25639,\n", + " 25639, 25639, 13645, 20605, 20605, 11233, 11233, 1696, 26848,\n", + " 26848]],\n", + " \n", + " [[ 1, 5208, 5798, 5798, 5798, 9590, 9590, 985, 7680,\n", + " 7680, 26237, 29803, 293, 25501, 293, 764, 8183, 22479,\n", + " 22479, 28099, 28099, 28099, 958, 958, 3836, 3836, 958,\n", + " 1862, 1862, 1862, 1862, 1862, 1785, 1785, 1785, 8853,\n", + " 8853]],\n", + " \n", + " [[ 1, 21904, 16507, 16507, 4301, 10294, 11584, 11256, 6648,\n", + " 6648, 5542, 7420, 7420, 28761, 28761, 13013, 10833, 10833,\n", + " 10833, 20438, 21904, 21904, 6958, 6958, 10309, 29560, 29560,\n", + " 29560, 6647, 6647, 187, 187, 27846, 27846, 27846, 20770,\n", + " 25509]],\n", + " \n", + " [[ 1, 29997, 3005, 3005, 18419, 23122, 23122, 18419, 18419,\n", + " 214, 11331, 9349, 9349, 9349, 9349, 21510, 21256, 21256,\n", + " 21256, 1196, 3563, 3563, 27767, 27767, 27767, 27767, 13086,\n", + " 13086, 13086, 13086, 13086, 13086, 27877, 27877, 27877, 27877,\n", + " 1872]],\n", + " \n", + " [[ 1, 28313, 28313, 17169, 17169, 14464, 17796, 17796, 2428,\n", + " 2428, 2428, 10442, 2179, 2179, 17302, 5220, 5220, 3163,\n", + " 3163, 11436, 11436, 11436, 11436, 6955, 28976, 28976, 19811,\n", + " 13937, 13937, 11414, 11414, 11414, 11414, 1161, 4585, 4585,\n", + " 13007]]], dtype=int32), 10.373442, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [12:25<00:00, 2.10it/s, accuracy=0.142, cost=6.19]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.63it/s, accuracy=0.134, cost=6.2] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.05320787" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/24.gru-birnn-seq2seq-bahdanau-luong.ipynb b/neural-machine-translation/24.gru-birnn-seq2seq-bahdanau-luong.ipynb deleted file mode 100644 index 7d5f8ea..0000000 --- a/neural-machine-translation/24.gru-birnn-seq2seq-bahdanau-luong.ipynb +++ /dev/null @@ -1,415 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " def bahdanau():\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - " \n", - " def luong():\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer//2, \n", - " memory = encoder_embedded)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size_layer//2), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer//2)\n", - "\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(),\n", - " cell_bw = luong(),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw[0],state_bw[0]), -1)\n", - " last_state = tuple([bi_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.541561, avg accuracy: 0.882873\n", - "epoch: 2, avg loss: 0.748266, avg accuracy: 0.913055\n", - "epoch: 3, avg loss: 0.730411, avg accuracy: 0.913673\n", - "epoch: 4, avg loss: 0.742766, avg accuracy: 0.911855\n", - "epoch: 5, avg loss: 0.712603, avg accuracy: 0.914591\n", - "epoch: 6, avg loss: 0.717559, avg accuracy: 0.913409\n", - "epoch: 7, avg loss: 0.694839, avg accuracy: 0.915955\n", - "epoch: 8, avg loss: 0.686324, avg accuracy: 0.915800\n", - "epoch: 9, avg loss: 0.680164, avg accuracy: 0.915964\n", - "epoch: 10, avg loss: 0.672714, avg accuracy: 0.916291\n", - "epoch: 11, avg loss: 0.665730, avg accuracy: 0.916245\n", - "epoch: 12, avg loss: 0.667158, avg accuracy: 0.915236\n", - "epoch: 13, avg loss: 0.652497, avg accuracy: 0.917318\n", - "epoch: 14, avg loss: 0.649192, avg accuracy: 0.916700\n", - "epoch: 15, avg loss: 0.642666, avg accuracy: 0.916936\n", - "epoch: 16, avg loss: 0.628474, avg accuracy: 0.919345\n", - "epoch: 17, avg loss: 0.638074, avg accuracy: 0.916682\n", - "epoch: 18, avg loss: 0.626017, avg accuracy: 0.918773\n", - "epoch: 19, avg loss: 0.618065, avg accuracy: 0.919236\n", - "epoch: 20, avg loss: 0.608225, avg accuracy: 0.920045\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: we have to fly at a special incline in order to make the measurements .\n", - "REAL ANSWER: chúng tôi phải bay với độ nghiêng đặc biệt để thực hiện các phép đo .\n", - "PREDICTED ANSWER: và tôi tôi tôi , , một một một một một của của của của . . \n", - "\n", - "row 2\n", - "QUESTION: so he 's mario . he 's our son .\n", - "REAL ANSWER: đây là mario . con trai chúng tôi .\n", - "PREDICTED ANSWER: và tôi là là . . \n", - "\n", - "row 3\n", - "QUESTION: the track that i will whistle is called " fête de la belle . "\n", - "REAL ANSWER: bản nhạt mà tôi sẽ huýt theo được gọi là " fête de la belle . "\n", - "PREDICTED ANSWER: và là là có tôi tôi tôi , , , " " " của của của . . \n", - "\n", - "row 4\n", - "QUESTION: a biohacker in germany , a journalist , wanted to know whose dog was leaving little presents on his street ?\n", - "REAL ANSWER: một nhà biohacker người đức , một nhà báo , muốn biết chó của ai đã để lại những " món quà " nho nhỏ trên đường ?\n", - "PREDICTED ANSWER: và một một một một , , , , , , , , , , và và và và của những của của của . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/24.gru-birnn-seq2seq-luong.ipynb b/neural-machine-translation/24.gru-birnn-seq2seq-luong.ipynb new file mode 100644 index 0000000..923375e --- /dev/null +++ b/neural-machine-translation/24.gru-birnn-seq2seq-luong.ipynb @@ -0,0 +1,820 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " batch_size = tf.shape(x)[0]\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " last_output = tf.concat((out_fw,out_bw), -1)\n", + "\n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = last_output)\n", + " rnn_cells = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " initial_state = rnn_cells.zero_state(batch_size, tf.float32).clone(cell_state=last_state)\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = initial_state,\n", + " dtype = tf.float32)\n", + " \n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :48: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :58: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 20652, 20652, 10810, 5540, 5355, 5355, 31771, 31771,\n", + " 29764, 27073, 27073, 27073, 11704, 11704, 11704, 11704, 22388,\n", + " 22388, 22388, 8874, 8874, 8874, 1856, 27233, 27233, 27233,\n", + " 27233, 31395, 22128, 22128, 22128, 1055, 1055, 1055, 1055,\n", + " 3039]],\n", + " \n", + " [[ 1, 12246, 12246, 12246, 24651, 11029, 1100, 29880, 27075,\n", + " 29880, 13243, 12246, 6685, 6685, 6685, 6437, 6437, 6437,\n", + " 5052, 5052, 23369, 23369, 29648, 29435, 29435, 10550, 10550,\n", + " 13703, 10631, 10631, 10631, 10631, 10631, 1103, 9563, 9563,\n", + " 9563]],\n", + " \n", + " [[ 1, 3183, 15312, 3759, 22196, 19342, 19342, 19342, 28582,\n", + " 28582, 9340, 9340, 7594, 7594, 14267, 14267, 721, 19772,\n", + " 19772, 9263, 9263, 23839, 23839, 23839, 9263, 2969, 2969,\n", + " 2969, 22764, 9263, 22764, 9263, 27219, 27219, 27219, 27219,\n", + " 27219]],\n", + " \n", + " [[ 1, 31561, 31561, 31561, 15722, 612, 612, 27056, 28462,\n", + " 28462, 4865, 370, 3522, 7286, 9094, 9094, 22778, 22778,\n", + " 10574, 10574, 10574, 10574, 11005, 11005, 19382, 19382, 19382,\n", + " 23642, 30970, 30970, 30970, 30970, 9190, 6149, 9190, 9190,\n", + " 9190]],\n", + " \n", + " [[ 1, 9311, 24914, 24914, 15999, 15999, 31762, 31762, 31750,\n", + " 16957, 25963, 13733, 13733, 28114, 4601, 18975, 18975, 18975,\n", + " 3487, 3487, 21915, 21915, 6860, 6860, 19533, 12601, 12601,\n", + " 17425, 17425, 17425, 4052, 4052, 4052, 11497, 11497, 15841,\n", + " 15841]],\n", + " \n", + " [[ 1, 23145, 23145, 15066, 27457, 31561, 31561, 9909, 14178,\n", + " 14178, 29732, 29732, 14579, 14579, 13734, 13734, 2951, 2158,\n", + " 9173, 4596, 4596, 12257, 3332, 3332, 29760, 29760, 29760,\n", + " 14251, 14251, 282, 282, 30878, 30878, 27575, 27575, 11518,\n", + " 11518]],\n", + " \n", + " [[ 1, 6392, 4696, 22583, 22583, 22583, 1479, 1479, 1479,\n", + " 3233, 3233, 3233, 3233, 3588, 3588, 3607, 23170, 18461,\n", + " 18461, 19234, 31243, 31243, 24895, 24895, 7804, 6917, 6917,\n", + " 27315, 27315, 27315, 27315, 5367, 18717, 7025, 7025, 7025,\n", + " 14113]],\n", + " \n", + " [[ 1, 15703, 15703, 20285, 3908, 1838, 1838, 1838, 5975,\n", + " 19133, 19133, 19133, 10342, 1825, 1825, 1825, 19786, 19786,\n", + " 19786, 19247, 19247, 19247, 15526, 4611, 4611, 15526, 31130,\n", + " 31130, 31130, 11182, 11182, 11182, 11182, 11182, 15732, 5364,\n", + " 31130]],\n", + " \n", + " [[ 1, 23070, 23070, 17032, 26381, 26381, 30808, 24581, 24581,\n", + " 17250, 17250, 17250, 15063, 15063, 17183, 19862, 30162, 30162,\n", + " 29751, 29353, 29353, 29353, 31743, 31743, 31743, 31743, 31743,\n", + " 19517, 19517, 21429, 21429, 23037, 23037, 29292, 29292, 28634,\n", + " 16432]],\n", + " \n", + " [[ 1, 19517, 4687, 14218, 14218, 26538, 26233, 26233, 26538,\n", + " 2671, 2671, 2671, 19753, 19753, 19753, 19753, 16998, 16998,\n", + " 7716, 7716, 7716, 7716, 7716, 7716, 11052, 11052, 1287,\n", + " 11052, 10503, 10503, 18788, 18788, 18788, 22247, 22247, 22247,\n", + " 6424]]], dtype=int32), 10.374498, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [12:32<00:00, 2.08it/s, accuracy=0.128, cost=6.31]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.58it/s, accuracy=0.14, cost=6.01] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.027758315" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/25.lstm-seq2seq-contrib-greedy-luong.ipynb b/neural-machine-translation/25.lstm-seq2seq-contrib-greedy-luong.ipynb new file mode 100644 index 0000000..f13a832 --- /dev/null +++ b/neural-machine-translation/25.lstm-seq2seq-contrib-greedy-luong.ipynb @@ -0,0 +1,796 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 7138, 7138, 7138, 30926, 30926, 16700, 17819, 17819, 20326,\n", + " 20326, 13225, 20326, 13225, 18195, 18195, 18195, 23446, 23446,\n", + " 29536, 29536, 17493, 10321, 10321, 11592, 11592, 11592, 11592,\n", + " 22077, 22077, 22077, 22077, 9846, 9846, 15087, 15087, 15087,\n", + " 15087, 4958, 9770, 14318, 13092, 13092, 13092, 17021, 17021,\n", + " 22868, 12202, 30796, 30796, 8231, 8231, 487, 487, 487,\n", + " 14234, 19256, 19256, 16703, 16703, 16703, 11533, 11533, 23770,\n", + " 20250, 17932, 17932, 17932, 17932, 16036, 16036, 16036, 16036],\n", + " [ 4810, 21134, 21134, 12884, 12884, 2289, 23218, 12934, 12934,\n", + " 30626, 7836, 7836, 18581, 31832, 31832, 27922, 27922, 24678,\n", + " 18531, 29712, 29712, 91, 24416, 24416, 24416, 24416, 24416,\n", + " 9180, 6844, 6844, 6844, 6844, 30512, 30512, 30512, 30512,\n", + " 30512, 15476, 15476, 3490, 3490, 3490, 11097, 11097, 10073,\n", + " 10073, 10073, 10073, 22736, 22736, 20183, 20183, 13079, 13983,\n", + " 13983, 8609, 25598, 24308, 24308, 4286, 4286, 14724, 14724,\n", + " 29875, 29875, 29875, 7420, 7420, 7420, 7420, 7420, 7420],\n", + " [12409, 16914, 4032, 21129, 4032, 19797, 17947, 17947, 26180,\n", + " 26180, 26235, 29470, 2256, 18503, 17039, 17039, 17039, 29536,\n", + " 29536, 4129, 4129, 22773, 22773, 21197, 21197, 29536, 28891,\n", + " 28891, 28891, 28891, 3700, 3700, 3700, 6681, 6681, 6618,\n", + " 6618, 6618, 13490, 13490, 13490, 13490, 16780, 16780, 16780,\n", + " 16780, 16780, 16780, 17713, 17713, 7017, 2015, 7017, 6553,\n", + " 10561, 10561, 16309, 31561, 10371, 10371, 10371, 28050, 24597,\n", + " 24597, 6286, 6286, 1912, 1912, 1912, 1912, 21678, 936],\n", + " [14627, 7653, 14627, 5135, 18905, 18905, 24607, 13086, 13086,\n", + " 27983, 24607, 6276, 20658, 20658, 20658, 22980, 22980, 22980,\n", + " 22980, 7526, 7526, 7526, 7526, 7526, 5962, 5962, 5962,\n", + " 5962, 15027, 15027, 15027, 15027, 15027, 15027, 15027, 15027,\n", + " 17515, 17515, 17515, 17040, 17040, 20994, 20994, 20994, 20994,\n", + " 4339, 4339, 4339, 4339, 4339, 4339, 31999, 31999, 31999,\n", + " 31999, 31999, 14586, 18158, 31813, 18158, 18158, 18158, 18158,\n", + " 18158, 243, 243, 21429, 21429, 21429, 11348, 23026, 16238],\n", + " [28192, 4294, 4294, 3967, 3967, 16510, 3379, 3379, 3379,\n", + " 6393, 20132, 12081, 12081, 12081, 765, 13339, 13339, 14770,\n", + " 27041, 9413, 9413, 18627, 18627, 12311, 12311, 26237, 26237,\n", + " 26237, 15904, 31221, 2866, 2866, 4513, 4513, 4513, 9238,\n", + " 9238, 4513, 4513, 31277, 31277, 31277, 25142, 5405, 30288,\n", + " 30288, 17677, 17677, 17677, 11563, 13202, 26252, 26252, 17801,\n", + " 17801, 25065, 10161, 25065, 25065, 12822, 12822, 12822, 14821,\n", + " 14821, 14821, 14821, 8028, 8028, 4584, 15205, 15205, 15205],\n", + " [ 2499, 2499, 13178, 13178, 2652, 2652, 26255, 26255, 26255,\n", + " 31645, 29284, 22348, 22348, 22348, 31645, 6906, 6906, 6906,\n", + " 31775, 31775, 29038, 29038, 10288, 24091, 24091, 23660, 23660,\n", + " 23660, 26043, 26043, 2827, 2827, 16549, 16549, 29668, 31261,\n", + " 30741, 27645, 31111, 23255, 23255, 23255, 23255, 5283, 5283,\n", + " 5283, 10945, 10945, 10945, 10945, 10945, 31194, 31194, 31194,\n", + " 31194, 31194, 31194, 31194, 7347, 7347, 7347, 7347, 3683,\n", + " 3683, 30140, 30140, 18604, 18604, 6210, 11213, 11213, 7719],\n", + " [25637, 14718, 14718, 27833, 27833, 11007, 11007, 5411, 5411,\n", + " 5411, 25637, 25637, 29430, 29430, 29430, 25213, 25213, 24206,\n", + " 6751, 6751, 1278, 1278, 1278, 13664, 7840, 30487, 30487,\n", + " 30487, 30487, 16505, 29060, 15463, 15463, 15463, 16024, 16024,\n", + " 19299, 16700, 16700, 16737, 5864, 20509, 20509, 20509, 20509,\n", + " 20042, 18633, 18633, 27572, 27572, 27572, 16700, 25029, 25029,\n", + " 25029, 25029, 19787, 19787, 19787, 19787, 17236, 17236, 9719,\n", + " 3825, 3825, 3825, 3825, 3825, 29173, 29173, 21505, 8762],\n", + " [ 2214, 2214, 2214, 14644, 18424, 18424, 28385, 28884, 28884,\n", + " 28884, 5925, 5925, 5925, 5925, 328, 328, 328, 28869,\n", + " 22185, 26287, 26287, 9605, 27854, 3847, 19388, 7742, 26473,\n", + " 29625, 29625, 13886, 13886, 13886, 13464, 13464, 23525, 28324,\n", + " 14461, 2613, 2613, 12488, 12488, 12488, 12488, 12488, 10071,\n", + " 10071, 11971, 11971, 11971, 11971, 29192, 29192, 28220, 7380,\n", + " 7380, 22200, 22200, 22200, 19388, 19388, 19388, 15756, 15756,\n", + " 15756, 15756, 18015, 18015, 18015, 14225, 14225, 14225, 14225],\n", + " [24173, 26084, 26084, 2549, 2549, 2549, 2086, 2086, 2086,\n", + " 16047, 20860, 21861, 21861, 20860, 20860, 30391, 30391, 30391,\n", + " 30391, 30391, 23377, 23377, 19940, 19940, 19940, 19940, 27413,\n", + " 6813, 6813, 6813, 4088, 5802, 5802, 5802, 7955, 328,\n", + " 328, 14662, 328, 14662, 23479, 23479, 2694, 2694, 12343,\n", + " 12343, 12246, 9099, 9099, 9099, 9099, 9099, 9099, 9099,\n", + " 9099, 9099, 13511, 13511, 13511, 13511, 13511, 13511, 13511,\n", + " 26239, 26239, 26239, 26239, 591, 10189, 21886, 22159, 22159],\n", + " [31378, 31378, 31378, 4600, 4600, 44, 5765, 29365, 16260,\n", + " 16260, 16260, 16260, 9260, 9260, 9260, 9260, 9260, 9260,\n", + " 9260, 9260, 9260, 16169, 16169, 16169, 16169, 2216, 19900,\n", + " 19900, 19900, 19900, 16039, 12566, 8899, 12566, 9162, 9162,\n", + " 9162, 9162, 27213, 9162, 27151, 27151, 31120, 31120, 2786,\n", + " 2786, 31120, 18206, 18206, 6514, 6514, 6514, 6514, 6514,\n", + " 2786, 14847, 14847, 6514, 6514, 6514, 6514, 6514, 6514,\n", + " 2786, 30042, 6514, 22796, 28786, 28786, 20674, 20674, 20674]],\n", + " dtype=int32), 10.3734, 0.0]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:48<00:00, 2.41it/s, accuracy=0.332, cost=4.11]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.97it/s, accuracy=0.36, cost=3.77] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.15195806" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/25.lstm-seq2seq-greedy-luong.ipynb b/neural-machine-translation/25.lstm-seq2seq-greedy-luong.ipynb deleted file mode 100644 index 499d45d..0000000 --- a/neural-machine-translation/25.lstm-seq2seq-greedy-luong.ipynb +++ /dev/null @@ -1,409 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5):\n", - " \n", - " def lstm_cell(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer, initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.X_seq_len)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.605726, avg accuracy: 0.051259\n", - "epoch: 2, avg loss: 6.106843, avg accuracy: 0.076966\n", - "epoch: 3, avg loss: 5.953540, avg accuracy: 0.099004\n", - "epoch: 4, avg loss: 5.837621, avg accuracy: 0.112893\n", - "epoch: 5, avg loss: 5.713791, avg accuracy: 0.119072\n", - "epoch: 6, avg loss: 5.579746, avg accuracy: 0.125937\n", - "epoch: 7, avg loss: 5.422983, avg accuracy: 0.134735\n", - "epoch: 8, avg loss: 5.266945, avg accuracy: 0.141316\n", - "epoch: 9, avg loss: 5.112668, avg accuracy: 0.150027\n", - "epoch: 10, avg loss: 5.011777, avg accuracy: 0.155969\n", - "epoch: 11, avg loss: 4.861548, avg accuracy: 0.159504\n", - "epoch: 12, avg loss: 4.657542, avg accuracy: 0.173969\n", - "epoch: 13, avg loss: 4.431281, avg accuracy: 0.194260\n", - "epoch: 14, avg loss: 4.240695, avg accuracy: 0.216166\n", - "epoch: 15, avg loss: 4.059111, avg accuracy: 0.232692\n", - "epoch: 16, avg loss: 3.888312, avg accuracy: 0.255748\n", - "epoch: 17, avg loss: 3.701045, avg accuracy: 0.279659\n", - "epoch: 18, avg loss: 3.547139, avg accuracy: 0.296897\n", - "epoch: 19, avg loss: 3.341076, avg accuracy: 0.328141\n", - "epoch: 20, avg loss: 3.109340, avg accuracy: 0.364322\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi có đau mọi mọi người nữa , tôi sẽ hỏi bạn sẽ đau máy máy nữa ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn được đau cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi giống giống giống các tài tài tài đó , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/26.gru-seq2seq-contrib-greedy-luong.ipynb b/neural-machine-translation/26.gru-seq2seq-contrib-greedy-luong.ipynb new file mode 100644 index 0000000..f7de86f --- /dev/null +++ b/neural-machine-translation/26.gru-seq2seq-contrib-greedy-luong.ipynb @@ -0,0 +1,826 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :35: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[18354, 13600, 13600, 13600, 417, 16219, 16219, 16219, 1859,\n", + " 1859, 22031, 23211, 22031, 9058, 9058, 30228, 30228, 9058,\n", + " 28363, 23269, 23269, 23269, 5272, 5272, 5272, 25428, 21558,\n", + " 21558, 21558, 29628, 26549, 27415, 27415, 27415, 28114, 483,\n", + " 28114, 6313, 1044, 1044, 26406, 26406, 26406, 26406, 26406,\n", + " 29222, 29222, 11350, 8810, 18488, 18488, 27101, 27101, 27101,\n", + " 26135, 26135, 26058, 26058, 11819, 11819, 25234, 3246, 3246,\n", + " 3246, 3246, 3246, 3246, 3246, 3246, 17864, 17864, 28098],\n", + " [17357, 17357, 26720, 26720, 5869, 19413, 19413, 19413, 19413,\n", + " 19413, 20363, 20363, 12041, 12041, 12041, 12577, 12480, 10066,\n", + " 10066, 10066, 13367, 13367, 20976, 20976, 17677, 0, 0,\n", + " 2342, 2342, 14368, 14368, 14368, 5885, 5885, 5885, 24093,\n", + " 24093, 7841, 20845, 20845, 20845, 20845, 20845, 29149, 4008,\n", + " 4008, 9356, 31351, 31351, 9605, 9605, 9605, 9605, 9605,\n", + " 11919, 11919, 11919, 11147, 11147, 14682, 14682, 15056, 14682,\n", + " 14682, 2500, 2500, 2500, 2500, 2500, 2500, 2500, 2500],\n", + " [27355, 7923, 24749, 23714, 22542, 22542, 22542, 23714, 31549,\n", + " 31549, 18849, 18849, 18849, 7569, 25943, 25943, 18849, 18849,\n", + " 18849, 9707, 9707, 6211, 30083, 30083, 25823, 25823, 2194,\n", + " 31166, 31166, 31166, 31166, 31166, 31166, 389, 389, 389,\n", + " 389, 27909, 30856, 30856, 30856, 25961, 7584, 18412, 18412,\n", + " 5784, 5784, 5784, 5784, 31100, 31100, 31100, 13682, 13682,\n", + " 13682, 11347, 11347, 11347, 15767, 15767, 15767, 15767, 15767,\n", + " 14487, 14487, 31726, 31726, 31726, 31726, 31481, 31481, 31481],\n", + " [ 1103, 20351, 20351, 1080, 8737, 21044, 23855, 23319, 6154,\n", + " 6154, 6154, 6154, 19943, 19943, 19943, 30277, 2099, 2099,\n", + " 2099, 2099, 31513, 18343, 18343, 18343, 18343, 10423, 10423,\n", + " 10423, 26923, 26923, 26923, 26923, 2027, 3405, 3405, 3405,\n", + " 4794, 4794, 1008, 1008, 1008, 1008, 5064, 20534, 29626,\n", + " 7646, 7646, 7646, 16014, 16014, 9113, 28412, 28412, 28412,\n", + " 28412, 3025, 3025, 9113, 28619, 28619, 3822, 3822, 6014,\n", + " 28619, 6014, 6014, 17794, 11274, 11274, 30679, 30679, 30679],\n", + " [ 7521, 10529, 25824, 25824, 5332, 3069, 3069, 24733, 24733,\n", + " 24733, 24733, 24733, 11689, 15593, 15593, 15593, 18130, 18130,\n", + " 9532, 9532, 145, 145, 145, 145, 25494, 17207, 17207,\n", + " 1703, 14073, 14073, 14073, 14073, 16355, 16355, 16355, 6099,\n", + " 4387, 4387, 6099, 6099, 17598, 17598, 17598, 31189, 31189,\n", + " 6541, 4387, 29400, 22214, 22214, 4387, 13901, 22214, 8632,\n", + " 622, 622, 622, 11282, 11282, 15028, 19879, 30249, 30249,\n", + " 30249, 30249, 30249, 30249, 30249, 30249, 16381, 3334, 3334],\n", + " [10013, 10013, 1901, 8588, 8588, 8588, 8588, 8588, 8588,\n", + " 8588, 8588, 28382, 24272, 24272, 24272, 9432, 9432, 9432,\n", + " 19742, 4340, 19042, 19042, 27848, 27588, 27588, 27848, 20624,\n", + " 20624, 20624, 20624, 6605, 6605, 6605, 6605, 6605, 30705,\n", + " 18215, 18215, 12454, 12454, 12454, 12454, 12454, 10049, 10049,\n", + " 22313, 22313, 22313, 23218, 23218, 15510, 15510, 15510, 24146,\n", + " 24146, 1739, 1739, 1739, 1739, 11645, 31941, 31941, 11645,\n", + " 11645, 31941, 31941, 1149, 8867, 1149, 329, 15094, 15094],\n", + " [19985, 3017, 7412, 7412, 7412, 7412, 8582, 5077, 22409,\n", + " 12552, 12552, 7775, 24167, 24167, 24167, 5833, 5833, 11465,\n", + " 11465, 11465, 11465, 11465, 21544, 21544, 21544, 21544, 19222,\n", + " 19222, 19222, 19222, 19222, 28927, 28927, 10437, 10437, 15837,\n", + " 15837, 21544, 21544, 26441, 26441, 7754, 7754, 7754, 15899,\n", + " 15899, 15899, 13096, 13096, 13096, 13096, 13096, 13096, 13096,\n", + " 13096, 9716, 9716, 24091, 24091, 24091, 24091, 24091, 24091,\n", + " 24091, 24091, 24091, 26110, 15048, 15048, 19724, 19724, 27131],\n", + " [ 1945, 30950, 790, 5929, 19974, 19974, 12724, 12724, 12724,\n", + " 19400, 19400, 19400, 31886, 8447, 8447, 8447, 16083, 19218,\n", + " 18833, 30237, 30237, 10209, 31885, 8814, 8814, 8814, 8814,\n", + " 4746, 4746, 13466, 7568, 7568, 7568, 7568, 7568, 9525,\n", + " 15695, 15695, 15695, 20126, 20126, 20126, 30214, 15695, 29650,\n", + " 29650, 15695, 15151, 15151, 29044, 24874, 24874, 29650, 19721,\n", + " 14559, 24874, 14559, 14559, 18815, 18815, 23854, 23854, 23854,\n", + " 23854, 5198, 5198, 12385, 24223, 24223, 31726, 31726, 31726],\n", + " [31332, 2869, 22133, 12945, 12945, 27891, 27891, 27891, 16106,\n", + " 16106, 14835, 11270, 11270, 11270, 22149, 22149, 22149, 26137,\n", + " 22149, 15542, 1815, 22149, 22149, 22149, 22149, 19699, 19699,\n", + " 22149, 22149, 22149, 24813, 24813, 17844, 29595, 29595, 13768,\n", + " 13768, 13768, 13768, 13768, 13768, 13768, 27156, 29003, 29003,\n", + " 29003, 29003, 19401, 19401, 19401, 19401, 19401, 24235, 24235,\n", + " 24235, 24235, 5108, 5108, 6172, 6172, 6172, 25695, 25695,\n", + " 25695, 25695, 25695, 8646, 8646, 8646, 8646, 14172, 14172],\n", + " [20513, 20513, 20513, 17418, 24530, 24530, 24530, 24530, 24530,\n", + " 24530, 589, 589, 589, 9715, 9715, 589, 15599, 19544,\n", + " 19544, 26378, 9711, 9711, 9711, 363, 363, 363, 13269,\n", + " 13269, 13269, 3359, 3359, 4684, 4684, 4684, 4684, 10318,\n", + " 16508, 16508, 22001, 22677, 29310, 29310, 29310, 29310, 29957,\n", + " 29957, 29957, 13847, 2918, 4648, 15389, 24477, 18083, 18083,\n", + " 18083, 18083, 7398, 7398, 7398, 7398, 7398, 7398, 7398,\n", + " 10063, 10063, 10063, 21786, 24231, 21786, 13535, 25279, 6954]],\n", + " dtype=int32), 10.373433, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:36<00:00, 2.45it/s, accuracy=0.289, cost=4.42]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.95it/s, accuracy=0.328, cost=3.9] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.101576895" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/26.gru-seq2seq-greedy-luong.ipynb b/neural-machine-translation/26.gru-seq2seq-greedy-luong.ipynb deleted file mode 100644 index 08dad56..0000000 --- a/neural-machine-translation/26.gru-seq2seq-greedy-luong.ipynb +++ /dev/null @@ -1,400 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", - " \n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.X_seq_len)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.577193, avg accuracy: 0.053858\n", - "epoch: 2, avg loss: 6.130342, avg accuracy: 0.066933\n", - "epoch: 3, avg loss: 5.923388, avg accuracy: 0.092064\n", - "epoch: 4, avg loss: 5.723215, avg accuracy: 0.115031\n", - "epoch: 5, avg loss: 5.551992, avg accuracy: 0.125058\n", - "epoch: 6, avg loss: 5.385205, avg accuracy: 0.133757\n", - "epoch: 7, avg loss: 5.173327, avg accuracy: 0.146041\n", - "epoch: 8, avg loss: 5.008676, avg accuracy: 0.154629\n", - "epoch: 9, avg loss: 4.766738, avg accuracy: 0.170606\n", - "epoch: 10, avg loss: 4.484841, avg accuracy: 0.190839\n", - "epoch: 11, avg loss: 4.201257, avg accuracy: 0.226365\n", - "epoch: 12, avg loss: 3.887028, avg accuracy: 0.254086\n", - "epoch: 13, avg loss: 3.548252, avg accuracy: 0.308555\n", - "epoch: 14, avg loss: 3.235637, avg accuracy: 0.359798\n", - "epoch: 15, avg loss: 2.984371, avg accuracy: 0.395088\n", - "epoch: 16, avg loss: 2.714506, avg accuracy: 0.441075\n", - "epoch: 17, avg loss: 2.483916, avg accuracy: 0.478835\n", - "epoch: 18, avg loss: 2.211511, avg accuracy: 0.534566\n", - "epoch: 19, avg loss: 1.941007, avg accuracy: 0.583770\n", - "epoch: 20, avg loss: 1.727307, avg accuracy: 0.627814\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/27.lstm-seq2seq-contrib-greedy-bahdanau.ipynb b/neural-machine-translation/27.lstm-seq2seq-contrib-greedy-bahdanau.ipynb new file mode 100644 index 0000000..8d9d82f --- /dev/null +++ b/neural-machine-translation/27.lstm-seq2seq-contrib-greedy-bahdanau.ipynb @@ -0,0 +1,816 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :35: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[29723, 29723, 23181, 23181, 29268, 4585, 29268, 29268, 16742,\n", + " 16742, 16742, 24199, 1312, 1312, 10641, 10641, 10641, 10641,\n", + " 29719, 30530, 30530, 30530, 5436, 14157, 14157, 14157, 16840,\n", + " 16840, 16840, 25082, 18035, 4266, 19400, 19400, 19400, 19400,\n", + " 19400, 1148, 1148, 1148, 17393, 22315, 22315, 13996, 2580,\n", + " 25967, 25967, 25967, 17342, 17342, 17342, 17342, 21829, 21829,\n", + " 21829, 21829, 21829, 21829, 22869, 26436, 26436, 26436, 26436,\n", + " 24282, 24282, 29652, 29652, 29723, 29723, 29723, 6952, 6952],\n", + " [ 8351, 5191, 5191, 13356, 13356, 27006, 27006, 27006, 5260,\n", + " 5260, 5260, 28490, 28490, 6496, 19141, 10, 10, 10,\n", + " 8926, 8926, 8926, 937, 3960, 3960, 7688, 4137, 4137,\n", + " 4137, 864, 864, 15733, 15733, 15733, 1055, 11341, 11341,\n", + " 11341, 11341, 11341, 9648, 26148, 216, 216, 5074, 9648,\n", + " 31336, 31336, 9648, 21286, 21286, 21286, 21286, 21286, 27056,\n", + " 27056, 15054, 15054, 4228, 15054, 15054, 3265, 3265, 10387,\n", + " 14249, 29939, 29939, 29939, 29939, 11016, 11016, 11016, 9941],\n", + " [26073, 28928, 8404, 8404, 31741, 31741, 17452, 17452, 2586,\n", + " 2586, 31829, 12656, 12656, 12656, 2703, 14069, 27348, 27348,\n", + " 27348, 27348, 1839, 1839, 5554, 5554, 5554, 8612, 8612,\n", + " 8612, 27035, 27035, 27035, 27035, 27035, 2626, 2626, 2626,\n", + " 2626, 23731, 18394, 18394, 1446, 1446, 1446, 1446, 16717,\n", + " 11024, 11024, 6605, 17412, 17412, 17412, 17412, 2429, 26959,\n", + " 7882, 20362, 20362, 18713, 18713, 18713, 612, 23140, 23140,\n", + " 1175, 1175, 22776, 20924, 20924, 22776, 17461, 17461, 22776],\n", + " [31144, 18145, 26490, 26490, 12937, 12937, 29833, 29833, 30406,\n", + " 18186, 12713, 18186, 18186, 29413, 29413, 13940, 13940, 13940,\n", + " 25893, 25893, 25893, 25893, 14157, 14157, 14157, 1294, 1294,\n", + " 13122, 13122, 13122, 13122, 13122, 15741, 9096, 10108, 3381,\n", + " 6426, 6426, 6426, 6426, 6426, 6426, 1761, 1761, 1761,\n", + " 1761, 6473, 6473, 12959, 12959, 12959, 4000, 4000, 4000,\n", + " 4000, 4000, 4000, 4000, 4000, 4000, 4648, 29397, 29397,\n", + " 29397, 14070, 14070, 14070, 14070, 14070, 14070, 9867, 9867],\n", + " [14859, 24553, 24553, 569, 569, 3872, 3872, 18800, 18800,\n", + " 13599, 13599, 18735, 18735, 25262, 24171, 7671, 7671, 7671,\n", + " 9569, 9569, 29624, 9569, 31084, 31084, 31084, 1805, 14622,\n", + " 31867, 21916, 14622, 14622, 16638, 16638, 12522, 12522, 12522,\n", + " 30494, 21274, 2076, 2976, 2976, 25971, 15392, 22784, 18053,\n", + " 721, 721, 21517, 28990, 6378, 6378, 1847, 1847, 1847,\n", + " 1847, 23533, 23533, 23533, 23533, 23533, 31781, 29435, 29435,\n", + " 29435, 25322, 25322, 28046, 28046, 30767, 30767, 16723, 16723],\n", + " [ 6348, 6348, 6348, 6348, 20722, 14319, 28185, 28185, 28185,\n", + " 28185, 25672, 25672, 25672, 25672, 29537, 29537, 15580, 24610,\n", + " 24610, 350, 26668, 26668, 350, 350, 26668, 8395, 8395,\n", + " 14067, 1165, 1165, 1165, 29945, 29945, 29945, 6626, 6218,\n", + " 6218, 6218, 6218, 6218, 6218, 11444, 23634, 11444, 27653,\n", + " 27653, 11444, 18022, 18022, 26386, 9044, 9044, 9044, 21163,\n", + " 13393, 3978, 3978, 30170, 30170, 30170, 17659, 17659, 17659,\n", + " 18290, 18290, 12625, 14170, 17551, 17551, 4897, 1692, 4897],\n", + " [ 7941, 7744, 7744, 409, 409, 409, 8595, 8595, 8595,\n", + " 8595, 1487, 1487, 1487, 12907, 25911, 25911, 10814, 10814,\n", + " 10814, 14761, 14761, 14761, 14761, 11468, 11468, 29101, 24146,\n", + " 24146, 24146, 3731, 3731, 3731, 3731, 25589, 25589, 25589,\n", + " 25589, 25589, 25589, 29540, 8716, 8716, 8716, 2809, 25208,\n", + " 2809, 25208, 6875, 6875, 6875, 5466, 5466, 10367, 1731,\n", + " 1731, 1731, 1731, 19281, 31850, 31850, 31850, 26881, 4581,\n", + " 4581, 4581, 4940, 4940, 28660, 25482, 26545, 26545, 19281],\n", + " [29164, 13273, 16725, 10716, 13496, 13496, 20769, 20769, 12844,\n", + " 8927, 8927, 3878, 3878, 3878, 3878, 3878, 23067, 23067,\n", + " 23067, 8690, 8690, 27371, 27371, 27371, 27371, 32, 32,\n", + " 25858, 25858, 25858, 25858, 12959, 23718, 23718, 23718, 23630,\n", + " 23630, 19574, 19574, 19574, 19574, 1104, 1104, 1104, 10251,\n", + " 10251, 10251, 10251, 10251, 10311, 10311, 10311, 1621, 1621,\n", + " 1621, 23053, 23053, 23511, 23511, 771, 6120, 18422, 18422,\n", + " 1961, 1961, 1961, 1961, 24975, 19929, 27007, 6630, 6630],\n", + " [17853, 21023, 21023, 7339, 7339, 7339, 22928, 22928, 22928,\n", + " 12902, 12902, 16564, 16564, 20089, 20089, 11980, 11980, 17462,\n", + " 17462, 17462, 13544, 13544, 25360, 25360, 25360, 25360, 21353,\n", + " 21353, 21353, 21353, 16010, 16010, 4963, 4963, 4963, 8733,\n", + " 8733, 22476, 22476, 4339, 4339, 22578, 19067, 29544, 3842,\n", + " 19936, 19936, 19936, 19936, 2252, 5976, 5976, 28212, 17417,\n", + " 12989, 12989, 6439, 1053, 1053, 1053, 1053, 5871, 14363,\n", + " 22868, 19232, 18035, 12293, 435, 435, 31549, 31549, 31549],\n", + " [24580, 30924, 29205, 29205, 12959, 21872, 9141, 9141, 3611,\n", + " 3611, 21876, 21876, 2699, 2699, 2699, 18109, 21493, 21493,\n", + " 19706, 19706, 7460, 7460, 14298, 14298, 14298, 14298, 14298,\n", + " 14298, 3362, 18401, 3362, 3362, 15856, 7536, 9027, 9027,\n", + " 9027, 16488, 16488, 31673, 31673, 10712, 22869, 12742, 12742,\n", + " 31356, 1773, 1773, 25638, 25638, 25638, 11952, 30649, 21020,\n", + " 21020, 21020, 21020, 12736, 12736, 12736, 12736, 12736, 12736,\n", + " 12736, 20227, 20227, 20227, 12466, 12466, 12466, 12466, 26671]],\n", + " dtype=int32), 10.373332, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:11<00:00, 2.33it/s, accuracy=0.327, cost=4.09]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.97it/s, accuracy=0.382, cost=3.64]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.15275387" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/27.lstm-seq2seq-greedy-bahdanau.ipynb b/neural-machine-translation/27.lstm-seq2seq-greedy-bahdanau.ipynb deleted file mode 100644 index 02bfc89..0000000 --- a/neural-machine-translation/27.lstm-seq2seq-greedy-bahdanau.ipynb +++ /dev/null @@ -1,401 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5):\n", - " \n", - " def lstm_cell(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer, initializer=tf.orthogonal_initializer(),\n", - " reuse=reuse)\n", - " \n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.X_seq_len)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.571634, avg accuracy: 0.049735\n", - "epoch: 2, avg loss: 6.121673, avg accuracy: 0.070271\n", - "epoch: 3, avg loss: 5.955901, avg accuracy: 0.093381\n", - "epoch: 4, avg loss: 5.830159, avg accuracy: 0.108793\n", - "epoch: 5, avg loss: 5.719205, avg accuracy: 0.119038\n", - "epoch: 6, avg loss: 5.574082, avg accuracy: 0.125461\n", - "epoch: 7, avg loss: 5.406810, avg accuracy: 0.130707\n", - "epoch: 8, avg loss: 5.233235, avg accuracy: 0.135666\n", - "epoch: 9, avg loss: 5.081944, avg accuracy: 0.150223\n", - "epoch: 10, avg loss: 4.984887, avg accuracy: 0.151207\n", - "epoch: 11, avg loss: 4.772585, avg accuracy: 0.163427\n", - "epoch: 12, avg loss: 4.557869, avg accuracy: 0.178095\n", - "epoch: 13, avg loss: 4.353389, avg accuracy: 0.202588\n", - "epoch: 14, avg loss: 4.154468, avg accuracy: 0.219880\n", - "epoch: 15, avg loss: 3.997348, avg accuracy: 0.230640\n", - "epoch: 16, avg loss: 3.821373, avg accuracy: 0.257113\n", - "epoch: 17, avg loss: 3.598599, avg accuracy: 0.283166\n", - "epoch: 18, avg loss: 3.340350, avg accuracy: 0.329164\n", - "epoch: 19, avg loss: 3.153779, avg accuracy: 0.357183\n", - "epoch: 20, avg loss: 3.001533, avg accuracy: 0.378199\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau sau sau sau sau sau mọi người nữa , tôi sẽ hỏi mọi người nữa , họ ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa bạn sẽ chọn cái nào , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này này này làm tiếp thí nghiệm này . \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau sau sau sau tài nghiên cứu hay nghiên cứu hay trợ , tôi nhận thêm nguồn tài trợ , và tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/28.gru-seq2seq-contrib-greedy-bahdanau.ipynb b/neural-machine-translation/28.gru-seq2seq-contrib-greedy-bahdanau.ipynb new file mode 100644 index 0000000..f3f20dc --- /dev/null +++ b/neural-machine-translation/28.gru-seq2seq-contrib-greedy-bahdanau.ipynb @@ -0,0 +1,826 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :32: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :35: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[28438, 30412, 19979, 31675, 19979, 19979, 4931, 18471, 3676,\n", + " 18471, 702, 29117, 21126, 21126, 21126, 21126, 29142, 29142,\n", + " 15474, 15474, 5174, 5174, 17397, 17397, 9708, 9708, 9708,\n", + " 9708, 13248, 13248, 13248, 8677, 21951, 9403, 9403, 31742,\n", + " 31742, 31742, 22656, 22656, 22656, 13919, 13919, 13919, 13919,\n", + " 13919, 13919, 8182, 6140, 8182, 8182, 8182, 8182, 4640,\n", + " 4640, 4640, 2338, 2338, 2338, 2338, 19056, 19056, 19056,\n", + " 19056, 19056, 2338, 2338, 2338, 2338, 2338, 2338, 3642],\n", + " [11324, 3639, 17358, 21230, 20101, 2521, 2521, 1366, 22217,\n", + " 22217, 23592, 23592, 23592, 23592, 23592, 23592, 23592, 6352,\n", + " 6352, 6352, 6352, 6352, 6352, 6352, 6352, 6352, 1485,\n", + " 1485, 10620, 10620, 18399, 18399, 6949, 6949, 25067, 325,\n", + " 11187, 11187, 22271, 22271, 9057, 9057, 9057, 9057, 9057,\n", + " 9057, 2036, 2036, 8337, 8337, 8337, 8337, 24709, 24709,\n", + " 24709, 2381, 2838, 2838, 2838, 2838, 2838, 2838, 2838,\n", + " 26916, 26916, 26916, 26916, 26916, 26916, 9572, 9572, 7095],\n", + " [15964, 212, 212, 18372, 19850, 19850, 1812, 1812, 1812,\n", + " 1812, 23011, 4036, 23011, 12462, 12462, 12462, 12462, 12462,\n", + " 705, 705, 705, 705, 705, 705, 23175, 23175, 23175,\n", + " 23175, 23001, 23001, 23001, 23001, 23001, 23001, 23001, 23001,\n", + " 23001, 23001, 26633, 1587, 5074, 5920, 5920, 5920, 5920,\n", + " 5920, 5920, 8314, 8314, 8314, 8314, 1157, 8738, 15751,\n", + " 15751, 15751, 15751, 15751, 15751, 29501, 29501, 23515, 23515,\n", + " 23515, 23515, 8674, 26168, 31608, 26955, 19317, 18169, 18169],\n", + " [18008, 18008, 30386, 30386, 30386, 30386, 19271, 6613, 6613,\n", + " 14650, 14650, 2407, 2407, 1374, 1374, 1672, 903, 903,\n", + " 903, 25049, 19054, 19054, 24470, 24470, 7563, 7563, 7563,\n", + " 9938, 17159, 17159, 17159, 26375, 26375, 1866, 17227, 17227,\n", + " 17227, 17227, 10954, 1204, 1204, 11284, 11284, 30134, 30134,\n", + " 30134, 30134, 25044, 22156, 22156, 22156, 22156, 22156, 22156,\n", + " 22487, 22487, 22487, 17181, 19056, 19056, 19056, 19056, 682,\n", + " 682, 682, 682, 31898, 24326, 24326, 13055, 12546, 12546],\n", + " [21112, 10447, 31905, 20350, 20350, 3935, 3935, 3935, 3935,\n", + " 3935, 23412, 3857, 3857, 14414, 5004, 4986, 4986, 4986,\n", + " 4986, 15958, 21338, 28082, 21480, 15377, 374, 374, 2447,\n", + " 16906, 16906, 16906, 27895, 27895, 23389, 2201, 2201, 2201,\n", + " 11450, 30854, 13565, 14997, 14997, 14997, 267, 267, 267,\n", + " 7347, 779, 779, 7993, 779, 779, 17478, 14062, 14062,\n", + " 14062, 3696, 3696, 3696, 3696, 11469, 10599, 10599, 10599,\n", + " 10599, 10599, 2460, 24425, 11552, 11552, 10599, 10599, 21332],\n", + " [12400, 12400, 12720, 12720, 10943, 10943, 299, 7903, 7903,\n", + " 3785, 11887, 11887, 11887, 11887, 20945, 20945, 14523, 20945,\n", + " 14523, 11538, 10826, 10826, 14860, 14860, 14860, 22725, 22725,\n", + " 24919, 16409, 16409, 16711, 16711, 16711, 19101, 16711, 16711,\n", + " 16711, 16711, 16711, 24290, 18655, 18655, 18655, 18655, 18655,\n", + " 13287, 13287, 13081, 13081, 8723, 8723, 26720, 1356, 1356,\n", + " 1356, 1356, 10588, 10588, 10588, 1533, 2386, 2386, 2386,\n", + " 2386, 2386, 2386, 2386, 2386, 2386, 2386, 16031, 16031],\n", + " [ 3796, 3668, 3668, 18685, 18685, 18685, 18685, 12558, 12558,\n", + " 4548, 22690, 22690, 5043, 1587, 1587, 1587, 9472, 9472,\n", + " 20330, 20330, 20330, 1266, 456, 1266, 30158, 30158, 30158,\n", + " 30158, 30158, 30158, 17794, 9066, 13010, 13010, 13010, 17043,\n", + " 30168, 15748, 15748, 15748, 15748, 15748, 25323, 25323, 25323,\n", + " 18794, 18794, 18794, 18794, 31651, 4422, 12582, 12582, 12582,\n", + " 22976, 22976, 2024, 2024, 26671, 26122, 26122, 3638, 317,\n", + " 317, 317, 317, 317, 16588, 16588, 16588, 16588, 16588],\n", + " [29794, 29794, 21923, 30481, 30481, 12602, 22017, 22017, 22017,\n", + " 22017, 22017, 22017, 5753, 5273, 5753, 5273, 9889, 9889,\n", + " 5891, 21840, 21840, 21840, 21840, 31806, 18903, 18903, 18903,\n", + " 30659, 30659, 31806, 31806, 11018, 11018, 31580, 22093, 22093,\n", + " 22093, 516, 516, 27968, 27968, 27968, 21217, 21217, 11699,\n", + " 21217, 21217, 21217, 21217, 31076, 31076, 3474, 3474, 3474,\n", + " 27769, 19665, 19665, 27676, 19665, 27676, 22215, 22215, 22215,\n", + " 21754, 21754, 21754, 5556, 5556, 5169, 5169, 5169, 11985],\n", + " [23549, 3037, 3037, 29815, 29815, 29815, 7993, 5421, 5421,\n", + " 5421, 5421, 5421, 5421, 7052, 7052, 29664, 29664, 1699,\n", + " 1699, 30012, 30012, 3407, 3407, 30064, 21308, 21308, 21308,\n", + " 21308, 6191, 6191, 6191, 5769, 5769, 5769, 9138, 29500,\n", + " 24331, 24331, 24331, 15260, 15260, 15260, 15260, 1357, 1357,\n", + " 1357, 1357, 4034, 4034, 4034, 4034, 4034, 19071, 19071,\n", + " 4034, 11158, 15088, 4034, 16448, 16448, 17541, 17541, 17541,\n", + " 17541, 22135, 31737, 31737, 31737, 14925, 10714, 10714, 10714],\n", + " [19467, 19467, 19467, 19467, 24276, 24276, 24276, 24276, 7779,\n", + " 5008, 28362, 2842, 2842, 2842, 25545, 749, 749, 749,\n", + " 20980, 15851, 8377, 8377, 13520, 13520, 13520, 13520, 13520,\n", + " 13520, 5289, 12334, 12334, 12334, 3608, 3608, 3608, 3608,\n", + " 9353, 9353, 15619, 15619, 15619, 15619, 30081, 30081, 30081,\n", + " 10815, 10815, 8348, 8348, 8348, 30514, 30514, 30514, 23608,\n", + " 23608, 31638, 13193, 13193, 3321, 3321, 3321, 22288, 22288,\n", + " 22288, 23221, 21424, 21424, 21424, 26446, 11256, 11256, 11256]],\n", + " dtype=int32), 10.372772, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:02<00:00, 2.36it/s, accuracy=0.39, cost=3.71] \n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.86it/s, accuracy=0.435, cost=3.29]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.13868862" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/28.gru-seq2seq-greedy-bahdanau.ipynb b/neural-machine-translation/28.gru-seq2seq-greedy-bahdanau.ipynb deleted file mode 100644 index cf03134..0000000 --- a/neural-machine-translation/28.gru-seq2seq-greedy-bahdanau.ipynb +++ /dev/null @@ -1,400 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", - " \n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.X_seq_len)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.582618, avg accuracy: 0.054946\n", - "epoch: 2, avg loss: 6.114680, avg accuracy: 0.082961\n", - "epoch: 3, avg loss: 5.967686, avg accuracy: 0.095073\n", - "epoch: 4, avg loss: 5.784970, avg accuracy: 0.116433\n", - "epoch: 5, avg loss: 5.635812, avg accuracy: 0.122000\n", - "epoch: 6, avg loss: 5.520928, avg accuracy: 0.126284\n", - "epoch: 7, avg loss: 5.308779, avg accuracy: 0.134689\n", - "epoch: 8, avg loss: 5.022196, avg accuracy: 0.150369\n", - "epoch: 9, avg loss: 4.739151, avg accuracy: 0.167817\n", - "epoch: 10, avg loss: 4.512145, avg accuracy: 0.190857\n", - "epoch: 11, avg loss: 4.290044, avg accuracy: 0.207300\n", - "epoch: 12, avg loss: 4.009138, avg accuracy: 0.231932\n", - "epoch: 13, avg loss: 3.802213, avg accuracy: 0.253560\n", - "epoch: 14, avg loss: 3.562338, avg accuracy: 0.293943\n", - "epoch: 15, avg loss: 3.295523, avg accuracy: 0.332365\n", - "epoch: 16, avg loss: 3.098837, avg accuracy: 0.367908\n", - "epoch: 17, avg loss: 2.733501, avg accuracy: 0.421269\n", - "epoch: 18, avg loss: 2.505281, avg accuracy: 0.461923\n", - "epoch: 19, avg loss: 2.384220, avg accuracy: 0.476441\n", - "epoch: 20, avg loss: 2.385950, avg accuracy: 0.470696\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau thôi thôi thôi nữa nữa nữa nữa nữa nữa ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu chọn chọn giữa 2 giữa 2 nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và và sau đó , giống các tài tài nghiên cứu hay khác , và tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/29.lstm-seq2seq-beam.ipynb b/neural-machine-translation/29.lstm-seq2seq-beam.ipynb deleted file mode 100644 index 2230d04..0000000 --- a/neural-machine-translation/29.lstm-seq2seq-beam.ipynb +++ /dev/null @@ -1,403 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def lstm_cell(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer, reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([lstm_cell() for _ in range(num_layers)])\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 0.5,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " \n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cells,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = tf.contrib.seq2seq.tile_batch(self.encoder_state, beam_width),\n", - " beam_width = beam_width,\n", - " output_layer = dense_layer,\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.619541, avg accuracy: 0.050701\n", - "epoch: 2, avg loss: 6.101795, avg accuracy: 0.072961\n", - "epoch: 3, avg loss: 6.021844, avg accuracy: 0.077485\n", - "epoch: 4, avg loss: 5.975375, avg accuracy: 0.085354\n", - "epoch: 5, avg loss: 5.939439, avg accuracy: 0.088880\n", - "epoch: 6, avg loss: 5.900819, avg accuracy: 0.093938\n", - "epoch: 7, avg loss: 5.870201, avg accuracy: 0.093786\n", - "epoch: 8, avg loss: 5.851087, avg accuracy: 0.095044\n", - "epoch: 9, avg loss: 5.819005, avg accuracy: 0.096579\n", - "epoch: 10, avg loss: 5.792674, avg accuracy: 0.099617\n", - "epoch: 11, avg loss: 5.760992, avg accuracy: 0.104751\n", - "epoch: 12, avg loss: 5.738297, avg accuracy: 0.105277\n", - "epoch: 13, avg loss: 5.714179, avg accuracy: 0.106252\n", - "epoch: 14, avg loss: 5.691054, avg accuracy: 0.107201\n", - "epoch: 15, avg loss: 5.667152, avg accuracy: 0.110339\n", - "epoch: 16, avg loss: 5.615595, avg accuracy: 0.113929\n", - "epoch: 17, avg loss: 5.578788, avg accuracy: 0.116033\n", - "epoch: 18, avg loss: 5.548088, avg accuracy: 0.118841\n", - "epoch: 19, avg loss: 5.501182, avg accuracy: 0.120766\n", - "epoch: 20, avg loss: 5.469031, avg accuracy: 0.122135\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi , chúng tôi , chúng tôi , chúng tôi , chúng tôi . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: tôi tôi tôi , tôi tôi . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tôi , tôi tôi , tôi tôi , tôi , \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi , chúng tôi , chúng tôi có thể , tôi tôi , và tôi , và tôi . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/29.lstm-seq2seq-contrib-beam-luong.ipynb b/neural-machine-translation/29.lstm-seq2seq-contrib-beam-luong.ipynb new file mode 100644 index 0000000..d483943 --- /dev/null +++ b/neural-machine-translation/29.lstm-seq2seq-contrib-beam-luong.ipynb @@ -0,0 +1,882 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 2621, 1264, 1264, 1264, 1264, 16636, 16636, 16636, 16636,\n", + " 14378, 14378, 14378, 14378, 19760, 19760, 19760, 21908, 21908,\n", + " 21908, 21908, 21908, 13366, 13366, 13366, 13366, 448, 448,\n", + " 448, 22883, 22883, 22883, 22883, 29429, 29429, 29429, 24660,\n", + " 24660, 24660, 24660, 31500, 31500, 8616, 8881, 8881, 3118,\n", + " 3118, 3118, 2160, 2160, 2160, 17274, 17274, 17274, 17274,\n", + " 28792, 28792, 28792, 28792, 28792, 28792, 28713, 28713, 28713,\n", + " 28713, 28713, 28713, 4153, 4153, 4153, 4153, 23355, 23355],\n", + " [ 9986, 3404, 3111, 3111, 3111, 8273, 8273, 8273, 21183,\n", + " 21183, 21183, 28782, 28782, 7649, 7649, 7649, 7649, 23993,\n", + " 23993, 1843, 1843, 5667, 5667, 5667, 5667, 5667, 5667,\n", + " 5667, 13397, 13397, 5667, 13397, 13397, 13397, 13397, 13397,\n", + " 21492, 21492, 21492, 2193, 9981, 9981, 9981, 9981, 9981,\n", + " 9981, 9981, 26799, 27094, 27094, 5556, 5556, 5556, 5556,\n", + " 5556, 29429, 29429, 29429, 29429, 25948, 25948, 9419, 20635,\n", + " 10751, 10751, 10751, 10751, 10751, 10751, 23070, 8088, 26614],\n", + " [ 4860, 4860, 4255, 4255, 4255, 7899, 29686, 29686, 29686,\n", + " 29686, 21008, 21008, 21008, 16002, 16002, 16002, 16002, 6665,\n", + " 6665, 6665, 6665, 6665, 6665, 22162, 22100, 22100, 7722,\n", + " 7722, 7722, 1678, 1678, 16839, 16839, 7476, 7476, 7476,\n", + " 7476, 18819, 18819, 18819, 18106, 18106, 18106, 18106, 18106,\n", + " 2917, 2917, 31932, 31932, 27238, 10883, 9603, 10753, 10753,\n", + " 10753, 23251, 18450, 18450, 25404, 16475, 16475, 16277, 21834,\n", + " 21834, 21834, 21834, 30948, 19357, 14917, 14917, 14917, 18421],\n", + " [24829, 24829, 26984, 18819, 18819, 18819, 30718, 30718, 16618,\n", + " 16618, 16618, 26208, 26208, 26208, 364, 23624, 23624, 23624,\n", + " 23624, 23624, 21052, 21052, 12045, 12045, 12045, 12045, 12045,\n", + " 1843, 1843, 1843, 1843, 1843, 8852, 8852, 4420, 4420,\n", + " 4420, 4420, 1497, 21999, 21999, 16622, 3856, 3856, 3856,\n", + " 3856, 1707, 29823, 29823, 20269, 20269, 20269, 25760, 25760,\n", + " 25760, 25760, 807, 807, 10013, 10013, 11858, 11858, 11858,\n", + " 11858, 11858, 11858, 11858, 30044, 30044, 30044, 30044, 30044],\n", + " [ 1403, 4829, 4829, 4829, 4829, 20663, 20663, 19101, 20663,\n", + " 7416, 7416, 7416, 11801, 11801, 18380, 17518, 11581, 18858,\n", + " 18858, 11581, 18858, 18858, 18858, 10224, 10224, 10224, 10224,\n", + " 10224, 6184, 6184, 6184, 19654, 24538, 24538, 24538, 24538,\n", + " 2007, 2007, 2007, 2007, 17342, 5407, 16018, 16018, 4015,\n", + " 4015, 4015, 4015, 4015, 5677, 982, 982, 30588, 5043,\n", + " 5043, 5043, 5043, 5043, 5043, 5043, 5043, 5043, 11627,\n", + " 11627, 11903, 11903, 11903, 11903, 11903, 11903, 11903, 11903],\n", + " [ 8771, 8771, 8771, 8771, 8771, 8771, 8771, 17624, 17624,\n", + " 16176, 16176, 16176, 16176, 6804, 6804, 25034, 25034, 25034,\n", + " 25034, 6884, 30518, 30518, 30518, 30518, 28688, 28688, 17660,\n", + " 9333, 9333, 30083, 30083, 30083, 15811, 15811, 9295, 9295,\n", + " 22614, 29182, 29182, 29182, 29182, 29182, 29182, 3842, 3842,\n", + " 3842, 15502, 15502, 22015, 22015, 15502, 22015, 22015, 8208,\n", + " 8208, 31583, 31583, 27460, 27460, 31583, 31583, 4636, 4636,\n", + " 4636, 4636, 4636, 12851, 15119, 19792, 19792, 19792, 19792],\n", + " [20026, 20026, 20026, 20026, 24192, 24192, 24192, 20899, 20899,\n", + " 20899, 8732, 30671, 30671, 26461, 26461, 26461, 26461, 6542,\n", + " 6542, 6542, 6542, 26257, 26257, 26257, 12980, 12980, 12980,\n", + " 12980, 12980, 12980, 10886, 10886, 10886, 10886, 10886, 26756,\n", + " 26756, 11327, 11327, 11327, 11327, 11327, 11327, 2178, 2178,\n", + " 11327, 2178, 2178, 2178, 1264, 1264, 1264, 1264, 1264,\n", + " 1264, 15627, 15627, 15627, 15627, 23802, 23802, 23802, 23802,\n", + " 23802, 23802, 23802, 3077, 3077, 1722, 7633, 3375, 3375],\n", + " [22197, 22197, 16953, 16953, 16953, 16953, 15268, 26173, 26173,\n", + " 26173, 30909, 30909, 8516, 8516, 8516, 25629, 23270, 23270,\n", + " 23270, 23270, 25128, 22140, 22140, 21079, 21079, 21079, 21079,\n", + " 21079, 12334, 12334, 12334, 12334, 12334, 21129, 21129, 21129,\n", + " 21129, 21129, 11611, 1531, 1531, 1531, 1531, 1531, 1531,\n", + " 1531, 1531, 17501, 17501, 27061, 27061, 17709, 17709, 5029,\n", + " 5029, 5029, 5029, 5029, 11777, 21666, 21666, 21666, 21666,\n", + " 1115, 1115, 1115, 1115, 1115, 15186, 15186, 21959, 21959],\n", + " [11646, 15335, 16834, 4607, 4607, 4607, 4607, 846, 846,\n", + " 846, 24604, 27544, 27544, 27544, 27544, 4774, 4774, 4774,\n", + " 4774, 9576, 9576, 9576, 9576, 9576, 30742, 30742, 30742,\n", + " 30742, 30557, 30557, 30557, 21078, 21586, 21586, 27571, 21586,\n", + " 27571, 27571, 23894, 23894, 19990, 19990, 19990, 19990, 4633,\n", + " 4633, 4633, 4633, 16593, 16593, 12997, 12997, 12997, 12997,\n", + " 21271, 21271, 21271, 21271, 14570, 31072, 31072, 31072, 13971,\n", + " 13971, 11326, 11326, 11326, 11326, 11326, 11326, 11326, 11326],\n", + " [ 6765, 3061, 22218, 22218, 6391, 6391, 6391, 23610, 23610,\n", + " 23610, 23610, 141, 141, 141, 141, 141, 141, 141,\n", + " 141, 10857, 10857, 10857, 141, 28050, 28050, 28050, 28050,\n", + " 28050, 29592, 29592, 24009, 31800, 31800, 23303, 23303, 12114,\n", + " 12114, 12114, 12114, 30874, 30874, 12114, 30874, 30874, 30874,\n", + " 30874, 30874, 30874, 30874, 24383, 24383, 24383, 21404, 21404,\n", + " 21404, 6237, 6237, 6237, 21404, 2497, 19717, 2497, 19717,\n", + " 25832, 25832, 25832, 2254, 2254, 2254, 2254, 2254, 1680]],\n", + " dtype=int32), 10.373835, 0.0]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:48<00:00, 2.41it/s, accuracy=0.336, cost=4.07]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.88it/s, accuracy=0.344, cost=3.8] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.17535137" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "([943,\n", + " 2610,\n", + " 409,\n", + " 925,\n", + " 9,\n", + " 346,\n", + " 289,\n", + " 3373,\n", + " 264,\n", + " 648,\n", + " 30773,\n", + " 391,\n", + " 9514,\n", + " 10050,\n", + " 18603,\n", + " 445,\n", + " 289,\n", + " 1451,\n", + " 325,\n", + " 3299,\n", + " 312,\n", + " 289,\n", + " 2196,\n", + " 317,\n", + " 1656,\n", + " 28736,\n", + " 12770,\n", + " 1561,\n", + " 336,\n", + " 346,\n", + " 321,\n", + " 17566,\n", + " 11387],\n", + " [648,\n", + " 671,\n", + " 409,\n", + " 3421,\n", + " 610,\n", + " 346,\n", + " 289,\n", + " 4084,\n", + " 264,\n", + " 648,\n", + " 30773,\n", + " 3437,\n", + " 300,\n", + " 9514,\n", + " 10050,\n", + " 18603,\n", + " 376,\n", + " 289,\n", + " 1451,\n", + " 325,\n", + " 3299,\n", + " 312,\n", + " 289,\n", + " 2196,\n", + " 317,\n", + " 1656,\n", + " 28736,\n", + " 26,\n", + " 1561,\n", + " 336,\n", + " 346,\n", + " 321,\n", + " 6341,\n", + " 11387])" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rights[0], results[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/3.gru-seq2seq-manual.ipynb b/neural-machine-translation/3.gru-seq2seq-manual.ipynb deleted file mode 100644 index 2322aaf..0000000 --- a/neural-machine-translation/3.gru-seq2seq-manual.ipynb +++ /dev/null @@ -1,384 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", - " dtype = tf.float32)\n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.551168, avg accuracy: 0.884691\n", - "epoch: 2, avg loss: 0.789407, avg accuracy: 0.910264\n", - "epoch: 3, avg loss: 0.734137, avg accuracy: 0.913064\n", - "epoch: 4, avg loss: 0.729298, avg accuracy: 0.913427\n", - "epoch: 5, avg loss: 0.724402, avg accuracy: 0.913400\n", - "epoch: 6, avg loss: 0.714839, avg accuracy: 0.914309\n", - "epoch: 7, avg loss: 0.716842, avg accuracy: 0.913573\n", - "epoch: 8, avg loss: 0.698991, avg accuracy: 0.915791\n", - "epoch: 9, avg loss: 0.697042, avg accuracy: 0.915373\n", - "epoch: 10, avg loss: 0.686981, avg accuracy: 0.915836\n", - "epoch: 11, avg loss: 0.690688, avg accuracy: 0.914136\n", - "epoch: 12, avg loss: 0.669089, avg accuracy: 0.917045\n", - "epoch: 13, avg loss: 0.658896, avg accuracy: 0.917655\n", - "epoch: 14, avg loss: 0.666537, avg accuracy: 0.915727\n", - "epoch: 15, avg loss: 0.657109, avg accuracy: 0.916336\n", - "epoch: 16, avg loss: 0.657437, avg accuracy: 0.916582\n", - "epoch: 17, avg loss: 0.637726, avg accuracy: 0.917873\n", - "epoch: 18, avg loss: 0.643969, avg accuracy: 0.916145\n", - "epoch: 19, avg loss: 0.619257, avg accuracy: 0.919318\n", - "epoch: 20, avg loss: 0.612224, avg accuracy: 0.920200\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: and as they wrote and debated , rather than seeing the films as artifacts , they began to see themselves .\n", - "REAL ANSWER: và khi chúng viết và tranh luận , hơn là thấy những bộ phim như là những tạo tác , chúng bắt đầu nhìn thấy bản thân .\n", - "PREDICTED ANSWER: và tôi tôi tôi , , , , , , và , , , , , , , , , , và và và . . . . \n", - "\n", - "row 2\n", - "QUESTION: rachel pike : the science behind a climate headline\n", - "REAL ANSWER: khoa học đằng sau một tiêu đề về khí hậu\n", - "PREDICTED ANSWER: và có là là là của của của của . . \n", - "\n", - "row 3\n", - "QUESTION: and i just couldn 't .\n", - "REAL ANSWER: nhưng tôi cứ không thể dừng được .\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi tôi . . \n", - "\n", - "row 4\n", - "QUESTION: that report was written by 620 scientists from 40 countries .\n", - "REAL ANSWER: nghiên cứu được viết bởi 620 nhà khoa học từ 40 quốc gia khác nhau .\n", - "PREDICTED ANSWER: và có là là là là , , của của của của . . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/3.gru-seq2seq.ipynb b/neural-machine-translation/3.gru-seq2seq.ipynb new file mode 100644 index 0000000..841640d --- /dev/null +++ b/neural-machine-translation/3.gru-seq2seq.ipynb @@ -0,0 +1,783 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " rnn_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " _, last_state = tf.nn.dynamic_rnn(rnn_cells, encoder_embedded,\n", + " sequence_length=X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :28: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :31: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :39: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 1699, 1699, 25298, 13174, 13174, 20508, 20508, 27483,\n", + " 27483, 16300, 16300, 30602, 30602, 30602, 30602, 17594, 17594,\n", + " 17594, 13492, 13492, 13492, 24412, 24412, 24498, 14195, 3280,\n", + " 3280, 14195, 26668, 26668, 11589, 28701, 4993, 10499, 10499,\n", + " 10499]],\n", + " \n", + " [[ 1, 4774, 4774, 4774, 12851, 26979, 20549, 20549, 20549,\n", + " 20549, 20549, 31327, 31327, 5790, 5578, 5578, 22044, 23720,\n", + " 6754, 6754, 13434, 18976, 18976, 9528, 9528, 15676, 15676,\n", + " 31673, 31673, 30900, 30900, 30900, 3430, 3430, 3430, 25223,\n", + " 25223]],\n", + " \n", + " [[ 1, 6792, 29592, 29592, 26695, 8311, 3801, 3801, 12131,\n", + " 14019, 8283, 7502, 7502, 7502, 8292, 8292, 8292, 8953,\n", + " 8953, 4081, 4081, 1219, 27847, 27478, 27478, 27478, 27478,\n", + " 2991, 18961, 18961, 18961, 28795, 28795, 11372, 11319, 11319,\n", + " 11319]],\n", + " \n", + " [[ 1, 14047, 13766, 13766, 21572, 21572, 21572, 9470, 29270,\n", + " 29270, 29270, 29270, 29270, 29270, 29270, 29270, 8503, 12957,\n", + " 12957, 15432, 22491, 22491, 29845, 29845, 29845, 29845, 30964,\n", + " 2339, 2339, 23023, 23023, 4824, 4824, 4824, 27164, 27164,\n", + " 30847]],\n", + " \n", + " [[ 1, 19814, 19814, 7490, 7490, 28382, 17712, 6460, 6460,\n", + " 2601, 2601, 2601, 2601, 20538, 19267, 19267, 30283, 30283,\n", + " 30283, 30283, 25068, 25068, 25068, 25068, 23501, 23501, 23501,\n", + " 31300, 31300, 12276, 12276, 12276, 12276, 12276, 8503, 8503,\n", + " 8503]],\n", + " \n", + " [[ 1, 7372, 2973, 2973, 2973, 10024, 23110, 23110, 3482,\n", + " 3482, 3482, 24776, 24776, 16503, 13338, 13338, 13338, 13338,\n", + " 13338, 13338, 28709, 28709, 22350, 22350, 22350, 17968, 13032,\n", + " 23589, 23589, 1738, 1738, 1738, 1738, 16882, 16882, 1738,\n", + " 4076]],\n", + " \n", + " [[ 1, 5171, 881, 881, 242, 4930, 4930, 4930, 30,\n", + " 30, 14999, 14999, 14999, 24861, 24861, 24861, 7867, 7867,\n", + " 7867, 7867, 14651, 14651, 31168, 31168, 31168, 10739, 10739,\n", + " 10739, 10739, 1542, 30434, 20070, 16599, 16599, 16599, 16692,\n", + " 22524]],\n", + " \n", + " [[ 1, 2315, 2315, 2315, 21240, 25521, 21240, 28889, 28889,\n", + " 28889, 28889, 15486, 21715, 21715, 21715, 3580, 3580, 3580,\n", + " 3580, 15582, 15582, 15582, 15582, 14999, 14999, 10857, 10857,\n", + " 10857, 14999, 14999, 10857, 15432, 15432, 24015, 8530, 28617,\n", + " 28617]],\n", + " \n", + " [[ 1, 12908, 26838, 26838, 20206, 20206, 1132, 6848, 6848,\n", + " 2506, 2506, 11849, 11849, 10894, 25315, 15065, 15065, 15065,\n", + " 25078, 25078, 25078, 21837, 21837, 15861, 30294, 15861, 24472,\n", + " 4800, 4800, 10936, 10936, 10936, 10936, 6877, 6877, 5928,\n", + " 21644]],\n", + " \n", + " [[ 1, 20857, 24836, 24836, 24836, 25389, 25389, 25389, 12705,\n", + " 12705, 12705, 12705, 30591, 1357, 1357, 4886, 4886, 4886,\n", + " 4886, 4886, 3972, 3972, 3972, 3972, 8270, 8270, 27491,\n", + " 13782, 13782, 24848, 10786, 10786, 22731, 20949, 19155, 19155,\n", + " 19155]]], dtype=int32), 10.372949, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [07:29<00:00, 3.48it/s, accuracy=0.116, cost=6.71] \n", + "minibatch loop: 100%|██████████| 40/40 [00:04<00:00, 8.35it/s, accuracy=0.118, cost=6.47]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/30.gru-seq2seq-beam.ipynb b/neural-machine-translation/30.gru-seq2seq-beam.ipynb deleted file mode 100644 index a6644ef..0000000 --- a/neural-machine-translation/30.gru-seq2seq-beam.ipynb +++ /dev/null @@ -1,402 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = encoder_embedded, \n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 0.5,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cells,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = tf.contrib.seq2seq.tile_batch(self.encoder_state, beam_width),\n", - " beam_width = beam_width,\n", - " output_layer = dense_layer,\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.598309, avg accuracy: 0.050472\n", - "epoch: 2, avg loss: 6.197631, avg accuracy: 0.058372\n", - "epoch: 3, avg loss: 6.120706, avg accuracy: 0.068117\n", - "epoch: 4, avg loss: 6.053386, avg accuracy: 0.079067\n", - "epoch: 5, avg loss: 5.961384, avg accuracy: 0.087013\n", - "epoch: 6, avg loss: 5.874920, avg accuracy: 0.092561\n", - "epoch: 7, avg loss: 5.769252, avg accuracy: 0.101284\n", - "epoch: 8, avg loss: 5.672418, avg accuracy: 0.105628\n", - "epoch: 9, avg loss: 5.588387, avg accuracy: 0.107274\n", - "epoch: 10, avg loss: 5.514471, avg accuracy: 0.111140\n", - "epoch: 11, avg loss: 5.388162, avg accuracy: 0.119001\n", - "epoch: 12, avg loss: 5.292507, avg accuracy: 0.123070\n", - "epoch: 13, avg loss: 5.151032, avg accuracy: 0.129496\n", - "epoch: 14, avg loss: 5.028226, avg accuracy: 0.132711\n", - "epoch: 15, avg loss: 4.921931, avg accuracy: 0.138244\n", - "epoch: 16, avg loss: 4.824532, avg accuracy: 0.140227\n", - "epoch: 17, avg loss: 4.729045, avg accuracy: 0.147684\n", - "epoch: 18, avg loss: 4.619212, avg accuracy: 0.154024\n", - "epoch: 19, avg loss: 4.535826, avg accuracy: 0.159074\n", - "epoch: 20, avg loss: 4.462981, avg accuracy: 0.163046\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi , mọi mọi , , bạn có thể hỏi , bạn sẽ sẽ không có thể không ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu tôi chọn giữa kiểu kiểu kiểu kiểu , bạn bạn sẽ sẽ bạn bạn sẽ ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục tục thí thí gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi đó đó , giống , tài tài tài tài tài , tài tài tài tài tài tài tài , , tôi nhận thêm nguồn nguồn tài trợ , \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/30.gru-seq2seq-contrib-beam-luong.ipynb b/neural-machine-translation/30.gru-seq2seq-contrib-beam-luong.ipynb new file mode 100644 index 0000000..d1f38ba --- /dev/null +++ b/neural-machine-translation/30.gru-seq2seq-contrib-beam-luong.ipynb @@ -0,0 +1,838 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :33: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[17978, 17978, 17978, 8641, 24680, 24680, 24680, 24680, 24680,\n", + " 24680, 24680, 24680, 24680, 24680, 24680, 15938, 5340, 12951,\n", + " 12951, 12951, 12951, 12951, 837, 837, 837, 837, 837,\n", + " 837, 25762, 25762, 25762, 25762, 31802, 8235, 8235, 16601,\n", + " 30709, 30709, 30709, 30709, 30709, 21454, 21454, 21454, 21454,\n", + " 5801, 27281, 12363, 12363, 12363, 12363, 12363, 12363, 12363,\n", + " 10247, 10247, 10247, 10247, 4951, 4951, 4951, 4951, 4951,\n", + " 28836, 28836, 28836, 21485, 23206, 23206, 23206, 23206, 7502],\n", + " [10552, 10552, 10552, 10552, 31545, 12915, 12915, 12915, 12362,\n", + " 24102, 24102, 24102, 31617, 20089, 20089, 20089, 27026, 27026,\n", + " 1299, 25489, 25489, 23610, 23610, 23610, 5026, 5026, 5026,\n", + " 5026, 5026, 5026, 5026, 5026, 26098, 26098, 26098, 8896,\n", + " 8896, 13024, 13024, 13024, 26098, 16926, 16926, 24488, 24488,\n", + " 24488, 24488, 24488, 1770, 1770, 1770, 1770, 1770, 1770,\n", + " 1770, 1770, 1770, 1770, 1770, 1770, 1770, 1770, 1770,\n", + " 1770, 1770, 1770, 1770, 1770, 1770, 1770, 29707, 12766],\n", + " [11867, 13536, 2818, 2818, 2818, 2818, 30915, 30915, 10623,\n", + " 10623, 10623, 10623, 10623, 10623, 20871, 13949, 13949, 13949,\n", + " 13949, 9922, 9922, 9922, 9922, 9922, 9922, 9922, 4729,\n", + " 4729, 4729, 4729, 3072, 3072, 3072, 3072, 3072, 7076,\n", + " 7076, 7076, 7076, 7076, 7076, 7076, 7076, 12963, 12963,\n", + " 7076, 12963, 12963, 12963, 12963, 24256, 24256, 29058, 29058,\n", + " 29058, 29058, 29058, 29058, 29025, 18055, 18055, 18055, 18055,\n", + " 18055, 18055, 9849, 17761, 17761, 17761, 17761, 17991, 24325],\n", + " [ 4424, 4424, 14165, 14084, 14084, 14084, 14084, 5052, 26143,\n", + " 26143, 26143, 26143, 26143, 12923, 12923, 30747, 30747, 30747,\n", + " 27182, 27182, 27182, 31878, 31878, 29210, 31878, 29210, 29210,\n", + " 22370, 22370, 1819, 1819, 1819, 24160, 24160, 13804, 13804,\n", + " 13804, 13804, 13804, 13804, 13804, 29416, 29416, 25621, 25621,\n", + " 25621, 19259, 19259, 19259, 19259, 1009, 27228, 27228, 31412,\n", + " 31412, 10249, 10249, 10249, 10249, 14590, 24263, 14590, 14590,\n", + " 24263, 14590, 27544, 27544, 27544, 17580, 17580, 17580, 27380],\n", + " [27107, 27107, 27107, 25597, 25597, 1305, 1305, 21927, 21927,\n", + " 21927, 29925, 29925, 29925, 29925, 29641, 29641, 29641, 29641,\n", + " 26220, 26220, 26220, 26220, 26220, 26220, 26220, 2472, 2472,\n", + " 2472, 25517, 25517, 25517, 25517, 25517, 8836, 8836, 27606,\n", + " 27606, 4393, 4393, 2241, 2241, 2241, 2241, 12029, 12029,\n", + " 12029, 12029, 12029, 24179, 27172, 27172, 17509, 17509, 17509,\n", + " 17509, 17509, 17509, 23298, 23298, 22618, 22618, 22618, 22618,\n", + " 22618, 22618, 1662, 1662, 15705, 15705, 15705, 15705, 21692],\n", + " [ 9172, 9172, 9172, 28629, 12377, 28629, 12377, 28629, 19336,\n", + " 19336, 19336, 19336, 30709, 30709, 6515, 6515, 14053, 14053,\n", + " 14053, 13684, 13684, 13684, 17921, 17921, 17921, 17921, 14163,\n", + " 26497, 24368, 24368, 24368, 24368, 24368, 24368, 26293, 26293,\n", + " 26293, 26293, 31962, 31962, 31962, 1096, 1096, 1096, 1096,\n", + " 29472, 29472, 29472, 29472, 11642, 11642, 11642, 11642, 11642,\n", + " 11642, 11642, 6985, 6985, 8361, 8361, 8361, 11283, 18507,\n", + " 18507, 18507, 18507, 25722, 25722, 6717, 6717, 6717, 6717],\n", + " [ 7679, 7679, 5737, 5737, 18576, 18576, 18576, 18576, 19840,\n", + " 19840, 19840, 19840, 19840, 19840, 19840, 19840, 19840, 18288,\n", + " 31096, 31096, 31096, 31096, 31096, 31096, 31096, 31096, 12684,\n", + " 12684, 4116, 4116, 2927, 2927, 2927, 2927, 2927, 2927,\n", + " 2927, 23364, 23364, 2927, 2927, 2927, 2927, 2927, 2927,\n", + " 23364, 23364, 2927, 2927, 2927, 2927, 2927, 24488, 24488,\n", + " 2927, 24488, 24488, 24488, 24488, 4096, 4096, 24488, 24488,\n", + " 20923, 20923, 20923, 20923, 29644, 29644, 29644, 29644, 4697],\n", + " [21891, 21891, 3325, 3325, 3325, 12500, 6856, 6856, 6856,\n", + " 6856, 15128, 15128, 15128, 15128, 15128, 15128, 15128, 5779,\n", + " 5779, 5779, 2414, 9630, 9630, 9630, 1271, 1271, 8146,\n", + " 8146, 8146, 8146, 571, 571, 571, 571, 571, 1757,\n", + " 1757, 1757, 1998, 1998, 1998, 1998, 4404, 4404, 30835,\n", + " 30835, 30835, 30835, 31512, 31512, 31512, 31512, 22308, 22308,\n", + " 22308, 22308, 19578, 13788, 13788, 13788, 13788, 13788, 13788,\n", + " 6408, 6408, 20913, 23621, 23621, 23621, 23621, 7031, 7031],\n", + " [ 4925, 4925, 13604, 13604, 13604, 16544, 16544, 16544, 16544,\n", + " 16544, 16544, 16544, 29554, 29554, 29554, 29554, 26381, 29554,\n", + " 26381, 20252, 20252, 20252, 20252, 20252, 24409, 24409, 24409,\n", + " 19330, 4349, 19330, 19330, 19330, 27655, 27655, 13303, 27655,\n", + " 13303, 22948, 22948, 13699, 13699, 5056, 5056, 27710, 27710,\n", + " 25981, 25981, 25981, 27080, 27080, 27080, 27080, 21865, 21865,\n", + " 21865, 21865, 11191, 11191, 11191, 11191, 11191, 11191, 11191,\n", + " 11191, 11191, 11191, 11191, 11191, 11191, 11191, 11191, 11191],\n", + " [22031, 22031, 22031, 22031, 22031, 12795, 12795, 12795, 12795,\n", + " 12795, 23284, 23284, 20913, 20913, 20913, 20913, 20913, 20913,\n", + " 20913, 20913, 20913, 20913, 20913, 20913, 20913, 20913, 20913,\n", + " 20913, 3622, 3622, 3622, 3363, 3363, 3363, 360, 20913,\n", + " 20913, 20913, 20913, 20913, 20913, 20913, 20913, 20913, 20913,\n", + " 20913, 20913, 20913, 20913, 20913, 20913, 20913, 20913, 21165,\n", + " 20913, 21165, 21165, 21165, 21165, 21165, 6294, 16768, 16768,\n", + " 16365, 16365, 16365, 18402, 18402, 18402, 18402, 18402, 22472]],\n", + " dtype=int32), 10.372692, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:52<00:00, 2.39it/s, accuracy=0.284, cost=4.39]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.82it/s, accuracy=0.333, cost=3.99]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.003980886" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/31.lstm-birnn-seq2seq-beam-luong.ipynb b/neural-machine-translation/31.lstm-birnn-seq2seq-beam-luong.ipynb deleted file mode 100644 index a15a248..0000000 --- a/neural-machine-translation/31.lstm-birnn-seq2seq-beam-luong.ipynb +++ /dev/null @@ -1,439 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " self.encoder_out = tf.concat((out_fw, out_bw), 2)\n", - " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", - " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer, reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cell,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(cell_state = encoder_state_tiled),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.609526, avg accuracy: 0.038657\n", - "epoch: 2, avg loss: 6.146571, avg accuracy: 0.070343\n", - "epoch: 3, avg loss: 6.013822, avg accuracy: 0.077701\n", - "epoch: 4, avg loss: 5.923426, avg accuracy: 0.084355\n", - "epoch: 5, avg loss: 5.843805, avg accuracy: 0.089249\n", - "epoch: 6, avg loss: 5.715099, avg accuracy: 0.098950\n", - "epoch: 7, avg loss: 5.633172, avg accuracy: 0.102873\n", - "epoch: 8, avg loss: 5.616269, avg accuracy: 0.103309\n", - "epoch: 9, avg loss: 5.426945, avg accuracy: 0.109016\n", - "epoch: 10, avg loss: 5.230996, avg accuracy: 0.115398\n", - "epoch: 11, avg loss: 5.079673, avg accuracy: 0.120815\n", - "epoch: 12, avg loss: 4.950464, avg accuracy: 0.123280\n", - "epoch: 13, avg loss: 4.813243, avg accuracy: 0.127943\n", - "epoch: 14, avg loss: 4.718412, avg accuracy: 0.128218\n", - "epoch: 15, avg loss: 4.632993, avg accuracy: 0.132206\n", - "epoch: 16, avg loss: 4.511515, avg accuracy: 0.143005\n", - "epoch: 17, avg loss: 4.360492, avg accuracy: 0.152970\n", - "epoch: 18, avg loss: 4.230304, avg accuracy: 0.159651\n", - "epoch: 19, avg loss: 4.135100, avg accuracy: 0.165019\n", - "epoch: 20, avg loss: 4.037326, avg accuracy: 0.171741\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi khi không , không , không , không , không không , không không không không không không không \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn chọn kiểu 2 kiểu cái cái cái cái cái cái cái cái cái cái cái cái cái \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tục tục tục này này này này này \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau sau đó , giống tài tài tài tài tài tài tài tài tài tài tài tài tài trợ tài trợ tài trợ tài trợ trợ tài trợ trợ tài trợ trợ tài trợ trợ tài trợ trợ trợ trợ trợ trợ trợ trợ trợ trợ trợ trợ trợ \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/31.lstm-seq2seq-contrib-beam-bahdanau.ipynb b/neural-machine-translation/31.lstm-seq2seq-contrib-beam-bahdanau.ipynb new file mode 100644 index 0000000..8ca9cde --- /dev/null +++ b/neural-machine-translation/31.lstm-seq2seq-contrib-beam-bahdanau.ipynb @@ -0,0 +1,835 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :33: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 2997, 2997, 4652, 5103, 5103, 5103, 5103, 5103, 5103,\n", + " 3027, 3027, 3027, 3027, 28179, 28179, 28179, 24611, 24611,\n", + " 31060, 24611, 31060, 31060, 31060, 6973, 6973, 6973, 2311,\n", + " 23258, 4617, 4617, 4617, 4617, 4617, 4617, 4617, 4617,\n", + " 4617, 8479, 8479, 22923, 22923, 22923, 24483, 24483, 24483,\n", + " 24483, 24483, 15174, 17543, 2812, 2812, 2812, 15174, 2812,\n", + " 2812, 2812, 2812, 2812, 13765, 2812, 13765, 13765, 2084,\n", + " 17770, 24367, 24367, 24367, 24367, 24367, 21129, 21129, 21129],\n", + " [ 1553, 1553, 1553, 1553, 16429, 16429, 16429, 16429, 16429,\n", + " 12654, 24705, 24705, 24705, 24705, 24705, 24705, 9250, 9250,\n", + " 25577, 9250, 9250, 9250, 9250, 15755, 15755, 15755, 15755,\n", + " 11352, 11352, 11352, 11352, 11352, 11352, 3858, 3858, 3858,\n", + " 3858, 3858, 3858, 29150, 31673, 31673, 31673, 31673, 31673,\n", + " 29150, 7190, 23623, 23623, 23623, 23623, 25644, 25644, 22335,\n", + " 22335, 22335, 27412, 27412, 27412, 27412, 27412, 27412, 24346,\n", + " 24346, 12629, 12629, 12629, 12629, 2604, 2604, 2604, 17781],\n", + " [ 4140, 1484, 1484, 21634, 21634, 21634, 21880, 21880, 21880,\n", + " 2786, 2786, 2228, 2228, 2228, 2228, 2058, 2058, 2058,\n", + " 2058, 9165, 9165, 9165, 9165, 22339, 22339, 22339, 22339,\n", + " 22339, 22339, 22339, 5595, 10937, 10937, 10937, 10715, 10715,\n", + " 10715, 10715, 10715, 27656, 27656, 19640, 19640, 19640, 9923,\n", + " 31576, 31576, 31576, 31576, 31576, 31576, 6295, 31576, 6295,\n", + " 6295, 10556, 10556, 28289, 15285, 15285, 2737, 2737, 2737,\n", + " 2737, 2737, 2737, 2737, 16878, 2737, 16878, 6221, 6221],\n", + " [22904, 17045, 17045, 17045, 17045, 17045, 3171, 18115, 18115,\n", + " 25266, 25266, 25266, 25266, 25266, 7293, 7293, 4233, 4233,\n", + " 10656, 10656, 10656, 10656, 10656, 23989, 26040, 26040, 12499,\n", + " 3482, 3541, 3541, 3541, 22434, 22434, 22434, 8291, 8291,\n", + " 8291, 26757, 26757, 26757, 4877, 4877, 4877, 4877, 10754,\n", + " 10754, 10754, 10754, 10754, 10754, 10754, 15769, 15769, 27376,\n", + " 27376, 27376, 23759, 4035, 4035, 4035, 13612, 13612, 31528,\n", + " 31528, 31528, 16748, 16748, 16748, 16748, 28249, 28249, 28249],\n", + " [26703, 26703, 26357, 29984, 29984, 15102, 15102, 1079, 24911,\n", + " 24911, 24911, 26612, 31914, 31914, 31914, 31914, 31914, 17197,\n", + " 17197, 17197, 17197, 31914, 6772, 6772, 6772, 31018, 5871,\n", + " 5871, 5871, 5871, 5871, 25278, 25278, 9995, 9995, 9995,\n", + " 9995, 9995, 9995, 9995, 26190, 26749, 26749, 26749, 26190,\n", + " 26190, 14261, 14261, 14261, 28729, 28729, 28729, 28729, 22649,\n", + " 22649, 22649, 21017, 21017, 21017, 21017, 21017, 31582, 31582,\n", + " 31582, 31582, 6117, 6117, 6117, 6117, 2280, 2280, 13174],\n", + " [ 7393, 12241, 12241, 17159, 17159, 17159, 17159, 28226, 28226,\n", + " 28226, 28226, 28226, 28226, 11368, 18059, 18059, 18059, 18059,\n", + " 5129, 5129, 8490, 8490, 16984, 16984, 16984, 26621, 3031,\n", + " 3031, 3031, 20376, 20376, 20376, 20376, 9806, 9806, 17704,\n", + " 30030, 30030, 30030, 30030, 24407, 24407, 16391, 16391, 7512,\n", + " 7512, 1612, 20703, 20703, 27551, 27551, 27551, 27551, 27551,\n", + " 27551, 27551, 27551, 27551, 27551, 27551, 27551, 24491, 24491,\n", + " 24491, 24491, 25428, 25428, 25428, 1649, 27893, 27893, 27893],\n", + " [14886, 14886, 14886, 23258, 23258, 23258, 23258, 18532, 19509,\n", + " 24500, 24500, 24500, 22117, 22117, 22117, 22117, 22117, 23012,\n", + " 23012, 7585, 7585, 7585, 7585, 7585, 13328, 13328, 21484,\n", + " 29890, 29890, 29890, 29890, 1654, 1654, 1654, 275, 7305,\n", + " 7305, 7305, 7305, 7305, 490, 490, 490, 4471, 4471,\n", + " 4471, 10010, 10010, 18907, 24886, 24886, 24886, 24886, 24886,\n", + " 24886, 8956, 8956, 25379, 25379, 25379, 25379, 25379, 20499,\n", + " 6210, 6210, 6210, 6210, 6210, 2762, 2762, 17508, 17508],\n", + " [31045, 31045, 22084, 22084, 22084, 22084, 22084, 2812, 2812,\n", + " 2812, 2812, 2812, 28529, 2492, 2492, 2492, 2492, 2492,\n", + " 9295, 9295, 9295, 9295, 9295, 9295, 28784, 28784, 28784,\n", + " 28784, 28784, 28784, 28784, 2576, 18565, 18565, 23616, 28784,\n", + " 28784, 28784, 18565, 23993, 23993, 23993, 23993, 22703, 22703,\n", + " 22703, 26231, 26231, 29668, 26231, 29668, 10915, 10915, 10915,\n", + " 10915, 10915, 14178, 14178, 14178, 14178, 4964, 4964, 4964,\n", + " 17344, 17344, 28848, 28848, 15382, 10671, 10671, 15382, 15382],\n", + " [28734, 11778, 11778, 11778, 25668, 191, 191, 191, 191,\n", + " 13759, 9201, 9201, 9201, 9201, 9201, 9201, 29232, 29232,\n", + " 29232, 29232, 15658, 15658, 15658, 15658, 11093, 21547, 21547,\n", + " 19215, 19215, 19215, 19215, 13108, 13108, 13108, 13108, 7668,\n", + " 7668, 7668, 7668, 20033, 20033, 20033, 19669, 19669, 19669,\n", + " 18538, 18538, 28481, 23415, 23415, 23415, 23415, 23415, 13046,\n", + " 13046, 21391, 21391, 21391, 21391, 21391, 7989, 7989, 16703,\n", + " 16703, 16703, 16703, 16703, 16703, 16703, 16703, 17625, 10000],\n", + " [14525, 21880, 21880, 17764, 17764, 17764, 17764, 1658, 1658,\n", + " 23757, 23757, 23757, 29346, 29346, 18311, 18311, 18075, 18075,\n", + " 18075, 18075, 5181, 5181, 5181, 5181, 5181, 7744, 26404,\n", + " 26404, 26404, 26404, 26404, 26404, 18559, 18559, 26404, 18559,\n", + " 18559, 18559, 8841, 14291, 28707, 28707, 28707, 28036, 28036,\n", + " 28036, 28036, 28036, 4907, 31020, 31020, 31020, 31020, 12547,\n", + " 12547, 27428, 29022, 11029, 11029, 29022, 26715, 26715, 26715,\n", + " 26435, 26435, 26435, 26435, 9292, 9292, 9292, 3008, 3008]],\n", + " dtype=int32), 10.373898, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:03<00:00, 2.36it/s, accuracy=0.337, cost=4.09]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.91it/s, accuracy=0.392, cost=3.65]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.17929372" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/32.gru-birnn-seq2seq-beam-luong.ipynb b/neural-machine-translation/32.gru-birnn-seq2seq-beam-luong.ipynb deleted file mode 100644 index 4321308..0000000 --- a/neural-machine-translation/32.gru-birnn-seq2seq-beam-luong.ipynb +++ /dev/null @@ -1,439 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def cells(size, reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layer // 2),\n", - " cell_bw = cells(size_layer // 2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw, state_bw), -1)\n", - " encoder_state = tuple([bi_state] * num_layers)\n", - " encoder_state = tuple(encoder_state[-1] for _ in range(num_layers))\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer, reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cell,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(cell_state = encoder_state_tiled),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.609513, avg accuracy: 0.044634\n", - "epoch: 2, avg loss: 6.158182, avg accuracy: 0.055338\n", - "epoch: 3, avg loss: 6.072051, avg accuracy: 0.074243\n", - "epoch: 4, avg loss: 5.924409, avg accuracy: 0.081263\n", - "epoch: 5, avg loss: 5.894405, avg accuracy: 0.085555\n", - "epoch: 6, avg loss: 5.840099, avg accuracy: 0.090470\n", - "epoch: 7, avg loss: 5.662232, avg accuracy: 0.096010\n", - "epoch: 8, avg loss: 5.531995, avg accuracy: 0.101268\n", - "epoch: 9, avg loss: 5.407217, avg accuracy: 0.108205\n", - "epoch: 10, avg loss: 5.257470, avg accuracy: 0.109851\n", - "epoch: 11, avg loss: 5.117006, avg accuracy: 0.119107\n", - "epoch: 12, avg loss: 4.984791, avg accuracy: 0.121906\n", - "epoch: 13, avg loss: 4.804212, avg accuracy: 0.135463\n", - "epoch: 14, avg loss: 4.691786, avg accuracy: 0.133704\n", - "epoch: 15, avg loss: 4.579430, avg accuracy: 0.137813\n", - "epoch: 16, avg loss: 4.445919, avg accuracy: 0.143831\n", - "epoch: 17, avg loss: 4.258706, avg accuracy: 0.152421\n", - "epoch: 18, avg loss: 4.075991, avg accuracy: 0.171615\n", - "epoch: 19, avg loss: 3.932725, avg accuracy: 0.183077\n", - "epoch: 20, avg loss: 3.854742, avg accuracy: 0.189787\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau sau thôi làm đau đau đau đau đau đau đau đau đau ? đau ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc hoặc chọn 2 2 kiểu kiểu đau ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tiếp tục tục thời thời thời gian gian thời thời \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: sau sau sau giống giống giống đề cứu cứu nguồn nguồn nguồn nguồn trợ nguồn nguồn trợ trợ nguồn nguồn trợ trợ nguồn nguồn trợ trợ nguồn nguồn trợ nguồn nguồn nguồn trợ nguồn nguồn nguồn nguồn nguồn nguồn nguồn nguồn nguồn . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/32.gru-seq2seq-contrib-beam-bahdanau.ipynb b/neural-machine-translation/32.gru-seq2seq-contrib-beam-bahdanau.ipynb new file mode 100644 index 0000000..eeb9cbf --- /dev/null +++ b/neural-machine-translation/32.gru-seq2seq-contrib-beam-bahdanau.ipynb @@ -0,0 +1,838 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :33: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[25507, 30476, 30476, 30476, 30288, 30288, 30288, 6869, 4344,\n", + " 4344, 4344, 4727, 4344, 19329, 19329, 19329, 19329, 11026,\n", + " 16153, 16153, 16153, 16153, 16153, 16153, 18614, 13763, 925,\n", + " 31739, 925, 1883, 1883, 1883, 1883, 1883, 1883, 1883,\n", + " 28615, 28615, 13490, 13490, 13490, 13490, 13490, 13490, 13490,\n", + " 15782, 15782, 15782, 15782, 28372, 27509, 28372, 28372, 28372,\n", + " 21754, 21754, 21754, 21754, 21754, 9832, 9832, 9832, 7652,\n", + " 7652, 7652, 7652, 7652, 7652, 27509, 27509, 27509, 27509],\n", + " [ 5499, 12746, 12746, 15702, 15702, 15702, 15702, 15702, 17510,\n", + " 7832, 7832, 7832, 7832, 7832, 7832, 7832, 7832, 7832,\n", + " 7832, 7832, 21267, 14591, 14591, 21267, 21267, 21267, 10229,\n", + " 10229, 10229, 10229, 10229, 10229, 23192, 23192, 23192, 23192,\n", + " 25841, 25841, 10491, 10491, 18560, 18560, 18560, 18294, 18294,\n", + " 18294, 18294, 18294, 17616, 17616, 17616, 17616, 17616, 30115,\n", + " 30115, 30115, 30115, 7368, 7368, 7368, 7368, 7368, 12847,\n", + " 12847, 12847, 12847, 12847, 12847, 27928, 27928, 27928, 21558],\n", + " [17726, 10466, 10466, 28474, 10466, 16859, 16859, 16859, 16859,\n", + " 16859, 18561, 18561, 22063, 4918, 4733, 4733, 4733, 12195,\n", + " 12195, 12195, 12195, 12195, 12195, 23903, 23903, 23903, 23903,\n", + " 23903, 23903, 23903, 23903, 27029, 27439, 27439, 27439, 27439,\n", + " 6138, 12025, 3911, 3911, 3911, 3911, 3911, 3911, 3911,\n", + " 26278, 9810, 9810, 9810, 9810, 9810, 9810, 5301, 2112,\n", + " 2112, 2112, 21743, 12829, 12829, 12829, 21743, 21883, 6157,\n", + " 6157, 6157, 6157, 6157, 6157, 174, 174, 18946, 18946],\n", + " [10154, 7030, 23375, 23375, 23375, 23375, 550, 17904, 550,\n", + " 17904, 126, 126, 126, 126, 126, 126, 126, 126,\n", + " 126, 126, 18880, 18880, 126, 18880, 18880, 18880, 18880,\n", + " 18880, 18880, 5265, 5265, 5265, 5265, 5265, 5265, 10306,\n", + " 10306, 10306, 10306, 10306, 10306, 1198, 1198, 1198, 24348,\n", + " 24348, 24348, 24348, 1198, 1198, 29578, 29578, 29578, 29578,\n", + " 27378, 27378, 27378, 5547, 5547, 5547, 5547, 11497, 31810,\n", + " 11497, 11497, 11497, 11497, 11497, 12495, 12495, 12495, 9695],\n", + " [ 9805, 4610, 23317, 17865, 17865, 6578, 1495, 1495, 5517,\n", + " 5517, 5517, 5517, 5517, 5517, 5517, 31549, 31549, 22771,\n", + " 4559, 21709, 21709, 17404, 17404, 17404, 17404, 17404, 17404,\n", + " 28291, 28291, 28291, 28291, 28291, 28291, 28291, 28291, 28291,\n", + " 28291, 30726, 30726, 25989, 30726, 25989, 25989, 25989, 25989,\n", + " 25989, 1740, 1740, 1740, 26721, 27053, 27053, 24830, 24830,\n", + " 24830, 24830, 24830, 24830, 24830, 24830, 24830, 24830, 24830,\n", + " 24830, 10525, 24830, 10525, 10525, 15722, 15722, 14012, 14012],\n", + " [15479, 13567, 13567, 31284, 31284, 2858, 2858, 2760, 2760,\n", + " 2760, 2760, 2760, 31123, 19975, 27237, 19975, 19975, 31802,\n", + " 31802, 31802, 16222, 15348, 15348, 31428, 31428, 31428, 28831,\n", + " 28831, 28831, 10491, 10491, 10491, 13895, 13895, 13895, 13895,\n", + " 13895, 13895, 13895, 6564, 6564, 6564, 27536, 27536, 8195,\n", + " 8195, 8195, 19396, 19396, 2576, 19396, 13895, 13895, 13895,\n", + " 13895, 13895, 16828, 16828, 16828, 16828, 16828, 16828, 16828,\n", + " 16828, 18408, 24983, 24983, 24983, 11617, 11617, 11617, 11617],\n", + " [ 1827, 17229, 28553, 31602, 31602, 31602, 6092, 21424, 21424,\n", + " 5305, 5305, 5305, 5305, 16957, 16957, 16957, 16957, 16957,\n", + " 16957, 16957, 173, 18141, 29222, 29222, 29222, 29222, 29222,\n", + " 10019, 10019, 29222, 10019, 10019, 10019, 10019, 29791, 29791,\n", + " 29791, 29791, 29791, 14038, 14038, 14038, 14038, 14038, 3374,\n", + " 3374, 31124, 3158, 3158, 3158, 3158, 3158, 3158, 3158,\n", + " 3158, 3158, 3158, 3158, 3158, 3158, 3158, 3158, 3158,\n", + " 3158, 3158, 3158, 3158, 3158, 3158, 3158, 2170, 2170],\n", + " [ 2842, 18022, 18022, 24660, 12311, 12311, 12311, 12311, 12311,\n", + " 12311, 9639, 9639, 9639, 9639, 9639, 9639, 9639, 31310,\n", + " 31310, 31310, 31310, 18486, 15688, 18486, 14685, 16640, 16640,\n", + " 16640, 16640, 16640, 16640, 27350, 27350, 27350, 25351, 9870,\n", + " 14785, 14785, 31936, 7565, 7565, 7565, 7565, 7565, 3159,\n", + " 28376, 3159, 28376, 3159, 4889, 4889, 4889, 3159, 3159,\n", + " 3159, 3159, 31306, 31306, 31306, 31306, 31306, 31306, 31306,\n", + " 31306, 23673, 15212, 15212, 15212, 15212, 15212, 16319, 16319],\n", + " [ 4721, 4721, 24020, 24020, 4028, 4028, 4028, 14951, 31330,\n", + " 31330, 31330, 31330, 31330, 31330, 31330, 7368, 7368, 7368,\n", + " 7368, 7368, 19614, 19614, 19614, 10639, 10639, 10639, 10639,\n", + " 10639, 7265, 7265, 7265, 7265, 7265, 7265, 7265, 7265,\n", + " 7265, 7265, 7265, 13824, 7265, 13824, 13824, 13824, 13824,\n", + " 13824, 7516, 7516, 7516, 7516, 26307, 26307, 19590, 19590,\n", + " 19590, 19590, 19590, 19590, 18522, 18522, 20312, 20312, 20312,\n", + " 20312, 20312, 20312, 20312, 20312, 4648, 18052, 15308, 15308],\n", + " [10068, 6869, 6869, 12858, 12858, 12858, 12858, 14428, 14428,\n", + " 31281, 24315, 24315, 24315, 24315, 24315, 21409, 24315, 14527,\n", + " 12709, 12709, 12709, 12709, 12709, 12709, 16728, 21356, 17562,\n", + " 17562, 17562, 4504, 13629, 13629, 13629, 190, 190, 190,\n", + " 13920, 29885, 29885, 7328, 7328, 7328, 7328, 7328, 17919,\n", + " 17919, 17919, 17919, 17919, 24855, 24855, 24855, 24855, 20240,\n", + " 20240, 20240, 20240, 22604, 22604, 22604, 22604, 3822, 15200,\n", + " 15200, 3425, 3425, 28879, 28879, 28879, 13411, 13411, 13411]],\n", + " dtype=int32), 10.372644, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:06<00:00, 2.34it/s, accuracy=0.383, cost=3.71]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.87it/s, accuracy=0.419, cost=3.14]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.1767827" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/33.lstm-birnn-seq2seq-contrib-beam-bahdanau.ipynb b/neural-machine-translation/33.lstm-birnn-seq2seq-contrib-beam-bahdanau.ipynb new file mode 100644 index 0000000..2933cc6 --- /dev/null +++ b/neural-machine-translation/33.lstm-birnn-seq2seq-contrib-beam-bahdanau.ipynb @@ -0,0 +1,817 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " encoder_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[23168, 23168, 23168, 31171, 31171, 13772, 13772, 13772, 12402,\n", + " 12402, 12402, 12402, 28816, 28816, 28816, 28816, 28816, 28816,\n", + " 20852, 20852, 15630, 6855, 6855, 6855, 6855, 6855, 6855,\n", + " 6855, 2631, 2631, 17941, 17941, 19057, 19057, 15185, 15185,\n", + " 15185, 16253, 16253, 16253, 16253, 16253, 16253, 31464, 31464,\n", + " 31464, 31464, 31464, 31464, 22820, 22820, 22820, 10320, 10320,\n", + " 10320, 10320, 20852, 17571, 17571, 17571, 17571, 21324, 21324,\n", + " 21324, 21324, 26041, 26041, 26041, 26041, 24964, 24964, 8724],\n", + " [31956, 31956, 31956, 31956, 12856, 12856, 12856, 12856, 12856,\n", + " 10310, 10310, 10310, 13742, 13742, 13742, 13742, 13742, 13742,\n", + " 19381, 8841, 8841, 8841, 8841, 14559, 14559, 14559, 11446,\n", + " 11446, 25518, 25518, 490, 490, 14940, 5620, 14940, 5620,\n", + " 5620, 26876, 10891, 10891, 10891, 10891, 29976, 3148, 3148,\n", + " 3148, 29976, 29976, 18657, 18941, 18657, 18657, 18657, 15270,\n", + " 15270, 15270, 15270, 15270, 3141, 3141, 3141, 6794, 6794,\n", + " 6794, 6794, 9298, 9298, 4277, 4277, 4277, 4277, 3935],\n", + " [30700, 30700, 9838, 4537, 4537, 25306, 25306, 25306, 25306,\n", + " 25306, 25306, 16458, 16458, 16458, 16458, 17691, 17691, 17691,\n", + " 17691, 17691, 15996, 30700, 30700, 30700, 30700, 30700, 30700,\n", + " 8628, 23752, 23752, 23752, 23752, 23752, 23752, 23752, 23752,\n", + " 16710, 16710, 16710, 24927, 24927, 24927, 6937, 6937, 6937,\n", + " 6937, 6937, 13124, 13485, 13485, 13485, 13485, 13485, 13485,\n", + " 1056, 1056, 1056, 1056, 1056, 1056, 18876, 18876, 25306,\n", + " 25306, 25306, 25306, 25306, 25306, 9211, 9211, 14374, 14367],\n", + " [22232, 26545, 26545, 26545, 26545, 26545, 26545, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309,\n", + " 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309, 22309],\n", + " [ 9233, 29119, 29119, 4086, 4086, 4086, 4086, 4086, 4086,\n", + " 4086, 4086, 4086, 4086, 19433, 4086, 19433, 4086, 4086,\n", + " 4086, 483, 483, 483, 29214, 27277, 27277, 1474, 26773,\n", + " 26773, 23656, 23385, 23656, 23656, 23385, 23656, 14694, 14694,\n", + " 14694, 14694, 14694, 23670, 23670, 23670, 23670, 7375, 23907,\n", + " 23907, 23907, 23907, 23907, 23907, 23907, 23907, 23907, 23907,\n", + " 23907, 23907, 27572, 27572, 28252, 28252, 28252, 3308, 1600,\n", + " 3308, 9259, 9259, 9259, 4020, 9259, 15117, 15117, 10397],\n", + " [ 9765, 9765, 9765, 9765, 20318, 20318, 20318, 20318, 23436,\n", + " 23436, 20576, 20576, 20576, 20576, 22256, 22256, 13586, 13586,\n", + " 13586, 13586, 28659, 28659, 22160, 19811, 19811, 19811, 19811,\n", + " 19811, 17575, 31410, 31410, 31410, 581, 581, 1376, 1376,\n", + " 1376, 17575, 22307, 22307, 22307, 22307, 31046, 31046, 31046,\n", + " 10837, 10837, 6026, 6026, 6026, 6026, 6026, 11180, 28082,\n", + " 28082, 28082, 28082, 15934, 15934, 15934, 23332, 23332, 23332,\n", + " 23332, 20731, 26279, 26279, 26279, 26279, 26279, 14793, 14793],\n", + " [27139, 31243, 31243, 31243, 31243, 20726, 9794, 9794, 26647,\n", + " 5955, 14632, 14632, 14632, 14632, 14632, 14632, 24522, 4873,\n", + " 4873, 4873, 4873, 4873, 4873, 4873, 27048, 27048, 27048,\n", + " 27048, 11336, 11336, 2809, 2809, 2809, 2809, 2809, 2809,\n", + " 2809, 2809, 2714, 3417, 4873, 4873, 4873, 4873, 4873,\n", + " 28355, 24167, 24167, 24167, 24167, 24167, 24167, 24167, 11758,\n", + " 11758, 11758, 11758, 11758, 11758, 5805, 5805, 5805, 5805,\n", + " 5805, 28355, 2605, 2605, 2605, 2605, 12759, 12759, 31347],\n", + " [ 896, 896, 896, 896, 14334, 14334, 14334, 14334, 27482,\n", + " 27482, 20071, 20071, 20071, 463, 1420, 1420, 1420, 898,\n", + " 898, 898, 4042, 4042, 20850, 20850, 20850, 20850, 20850,\n", + " 20850, 532, 532, 532, 532, 23266, 23266, 28902, 28902,\n", + " 28902, 28902, 5029, 20438, 20438, 20438, 20438, 20438, 20438,\n", + " 20438, 20438, 12948, 12948, 12948, 12948, 31497, 2623, 2623,\n", + " 2623, 2623, 2623, 2623, 2623, 30020, 2623, 30020, 30020,\n", + " 30020, 30020, 30020, 30020, 30020, 30020, 31828, 7368, 7368],\n", + " [ 4620, 4620, 4620, 4620, 4620, 10586, 10586, 4620, 10586,\n", + " 10586, 10586, 10586, 10586, 10586, 6859, 6329, 19739, 19739,\n", + " 19739, 19739, 19739, 19739, 19739, 13150, 13150, 13150, 13150,\n", + " 6608, 27766, 27766, 22472, 22472, 22472, 5741, 5741, 3780,\n", + " 3780, 14344, 14344, 14344, 14344, 14344, 4577, 221, 221,\n", + " 221, 221, 221, 13980, 221, 23968, 23968, 23968, 23968,\n", + " 23968, 23968, 23968, 23968, 23968, 13851, 23968, 11300, 23968,\n", + " 11300, 16456, 16456, 16456, 16456, 4202, 21848, 18258, 18258],\n", + " [16804, 569, 569, 569, 569, 5, 5, 28030, 28030,\n", + " 28030, 26793, 26793, 26793, 26793, 26793, 27082, 10313, 10313,\n", + " 10313, 10313, 1867, 1867, 1867, 1867, 1867, 11674, 11674,\n", + " 26866, 26866, 26866, 26866, 14886, 20792, 20792, 5, 5,\n", + " 5, 5, 5, 1345, 1345, 25486, 25486, 25486, 15874,\n", + " 15874, 15874, 15874, 15874, 15874, 15874, 15874, 15874, 15874,\n", + " 15023, 15023, 10901, 10513, 10513, 10513, 10513, 10513, 10513,\n", + " 7066, 7066, 7066, 7066, 7066, 18825, 18825, 18825, 18825]],\n", + " dtype=int32), 10.3733425, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [13:56<00:00, 1.87it/s, accuracy=0.388, cost=3.71]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.89it/s, accuracy=0.446, cost=3.2] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.19480321" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/33.lstm-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb b/neural-machine-translation/33.lstm-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb deleted file mode 100644 index 6da9765..0000000 --- a/neural-machine-translation/33.lstm-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb +++ /dev/null @@ -1,474 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def cells(size, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " def bahdanau(size):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " def luong(size):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(size_layer//2),\n", - " cell_bw = luong(size_layer//2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw[0].c, state_bw[0].c), -1)\n", - " bi_state_h = tf.concat((state_fw[0].h, state_bw[0].h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " dense = tf.layers.Dense(to_dict_size)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = decoder_cells.zero_state(batch_size, tf.float32),\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cells,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cells.zero_state(batch_size * beam_width, tf.float32),\n", - " beam_width = beam_width,\n", - " output_layer = dense,\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.578953, avg accuracy: 0.047377\n", - "epoch: 2, avg loss: 6.173394, avg accuracy: 0.056745\n", - "epoch: 3, avg loss: 6.097245, avg accuracy: 0.056194\n", - "epoch: 4, avg loss: 6.015269, avg accuracy: 0.065065\n", - "epoch: 5, avg loss: 5.933231, avg accuracy: 0.071004\n", - "epoch: 6, avg loss: 5.866334, avg accuracy: 0.070673\n", - "epoch: 7, avg loss: 5.808472, avg accuracy: 0.070024\n", - "epoch: 8, avg loss: 5.745262, avg accuracy: 0.074432\n", - "epoch: 9, avg loss: 5.669869, avg accuracy: 0.076299\n", - "epoch: 10, avg loss: 5.590339, avg accuracy: 0.078071\n", - "epoch: 11, avg loss: 5.511030, avg accuracy: 0.081998\n", - "epoch: 12, avg loss: 5.430203, avg accuracy: 0.084098\n", - "epoch: 13, avg loss: 5.353356, avg accuracy: 0.086704\n", - "epoch: 14, avg loss: 5.276394, avg accuracy: 0.087023\n", - "epoch: 15, avg loss: 5.188393, avg accuracy: 0.089363\n", - "epoch: 16, avg loss: 5.097336, avg accuracy: 0.087282\n", - "epoch: 17, avg loss: 4.987069, avg accuracy: 0.092019\n", - "epoch: 18, avg loss: 4.878201, avg accuracy: 0.095458\n", - "epoch: 19, avg loss: 4.788191, avg accuracy: 0.096424\n", - "epoch: 20, avg loss: 4.680217, avg accuracy: 0.098961\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: nhưng thôi thôi , , , , , , , hỏi ? ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: nếu nếu chọn chọn chọn chọn nào nào nào nào nào cái cái cái cái cái \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi tục tôi tôi tôi tôi tôi tôi tôi tôi tôi \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi tôi tôi tôi tôi tài tài tài tài tài tài tài thêm thêm trợ trợ \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/34.gru-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb b/neural-machine-translation/34.gru-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb deleted file mode 100644 index 040e93b..0000000 --- a/neural-machine-translation/34.gru-birnn-seq2seq-luong-bahdanau-stack-beam.ipynb +++ /dev/null @@ -1,472 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def cells(size, reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size, reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " def bahdanau(size):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " def luong(size):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(size_layer//2),\n", - " cell_bw = luong(size_layer//2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw[0],state_bw[0]), -1)\n", - " encoder_state = tuple([bi_state] * num_layers)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = decoder_cells.zero_state(batch_size, tf.float32),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cells,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cells.zero_state(batch_size * beam_width, tf.float32),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.584728, avg accuracy: 0.048752\n", - "epoch: 2, avg loss: 6.195197, avg accuracy: 0.053916\n", - "epoch: 3, avg loss: 6.158399, avg accuracy: 0.055223\n", - "epoch: 4, avg loss: 6.075812, avg accuracy: 0.066991\n", - "epoch: 5, avg loss: 5.967982, avg accuracy: 0.074047\n", - "epoch: 6, avg loss: 5.866275, avg accuracy: 0.078856\n", - "epoch: 7, avg loss: 5.797270, avg accuracy: 0.073335\n", - "epoch: 8, avg loss: 5.744899, avg accuracy: 0.077205\n", - "epoch: 9, avg loss: 5.691444, avg accuracy: 0.077969\n", - "epoch: 10, avg loss: 5.635943, avg accuracy: 0.081965\n", - "epoch: 11, avg loss: 5.571292, avg accuracy: 0.082626\n", - "epoch: 12, avg loss: 5.491851, avg accuracy: 0.084104\n", - "epoch: 13, avg loss: 5.469053, avg accuracy: 0.082074\n", - "epoch: 14, avg loss: 5.413826, avg accuracy: 0.085476\n", - "epoch: 15, avg loss: 5.313320, avg accuracy: 0.086364\n", - "epoch: 16, avg loss: 5.203887, avg accuracy: 0.086610\n", - "epoch: 17, avg loss: 5.097645, avg accuracy: 0.089446\n", - "epoch: 18, avg loss: 5.020148, avg accuracy: 0.092671\n", - "epoch: 19, avg loss: 4.931250, avg accuracy: 0.090323\n", - "epoch: 20, avg loss: 4.855538, avg accuracy: 0.091473\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: bạn , , , , , , ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: tôi tôi tôi tôi tôi tôi tôi tôi \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi tôi tôi tôi thời thời \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: sau tôi , , , , , , tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi tôi \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/34.lstm-birnn-seq2seq-contrib-beam-luong.ipynb b/neural-machine-translation/34.lstm-birnn-seq2seq-contrib-beam-luong.ipynb new file mode 100644 index 0000000..cb7a6b6 --- /dev/null +++ b/neural-machine-translation/34.lstm-birnn-seq2seq-contrib-beam-luong.ipynb @@ -0,0 +1,816 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " encoder_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[20977, 20977, 20977, 20977, 20977, 20977, 29424, 29424, 29424,\n", + " 29424, 29424, 29424, 29424, 29424, 29424, 29424, 29424, 29424,\n", + " 29424, 29424, 29424, 29424, 29424, 29424, 29424, 29424, 20993,\n", + " 20993, 20993, 20993, 27360, 27360, 27360, 1596, 1596, 673,\n", + " 673, 8445, 8445, 621, 621, 30890, 30890, 30890, 30890,\n", + " 30305, 30305, 30305, 30305, 30305, 30305, 21659, 21659, 21659,\n", + " 21659, 21659, 26107, 8298, 21312, 21312, 21312, 21312, 21312,\n", + " 21312, 27572, 27572, 27572, 27572, 27572, 27572, 806, 4463],\n", + " [20373, 20373, 15161, 14749, 14749, 14749, 14749, 14749, 14749,\n", + " 11060, 11060, 25364, 25364, 12627, 12627, 12627, 12627, 3887,\n", + " 3887, 3887, 3887, 3887, 8114, 26873, 26873, 26873, 26873,\n", + " 26873, 26873, 26873, 26873, 26873, 31653, 3531, 3531, 3531,\n", + " 3531, 8713, 8713, 8713, 8713, 8713, 26556, 26556, 27935,\n", + " 27935, 27935, 27935, 27477, 22346, 22346, 22346, 21570, 21570,\n", + " 21570, 21570, 23050, 23050, 8667, 8667, 8667, 8667, 12231,\n", + " 12231, 12231, 12231, 12231, 15382, 15382, 15382, 15382, 20699],\n", + " [10479, 10479, 25875, 16847, 16847, 16847, 14016, 14016, 14016,\n", + " 14016, 14016, 22190, 22190, 22190, 22190, 22190, 22190, 17210,\n", + " 17210, 17210, 12383, 12383, 12383, 12383, 12383, 24544, 24544,\n", + " 24544, 24544, 31683, 31683, 31683, 31683, 31683, 16246, 31683,\n", + " 31683, 16246, 31683, 31683, 16246, 16246, 31683, 12663, 12663,\n", + " 31683, 8077, 8077, 8077, 8077, 8077, 29973, 29973, 29973,\n", + " 17318, 31915, 31915, 31915, 31915, 31915, 31915, 16743, 16743,\n", + " 16743, 16743, 16743, 16743, 27717, 27717, 27717, 27717, 27717],\n", + " [24803, 24803, 18243, 8198, 8198, 8198, 21840, 8198, 21840,\n", + " 21840, 21840, 29434, 29434, 29434, 29434, 26054, 26054, 26054,\n", + " 26054, 26054, 4964, 4964, 4964, 4964, 25911, 25911, 2960,\n", + " 2960, 2960, 2960, 2960, 2960, 2960, 2960, 9320, 9320,\n", + " 8425, 8425, 8425, 8425, 8425, 8425, 8425, 26646, 82,\n", + " 82, 82, 82, 8425, 8425, 8425, 8425, 8425, 26646,\n", + " 26646, 26646, 26646, 82, 16478, 82, 82, 82, 10095,\n", + " 10095, 10095, 31326, 31326, 31326, 17031, 17031, 17031, 17031],\n", + " [22956, 1386, 1386, 1386, 1386, 14434, 8003, 8003, 8003,\n", + " 8003, 24330, 24330, 28180, 6248, 28180, 28180, 28180, 2339,\n", + " 2339, 7253, 7253, 7253, 1269, 1269, 1269, 17645, 23568,\n", + " 23568, 23568, 23568, 23568, 23568, 17016, 17016, 1932, 17016,\n", + " 17016, 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392,\n", + " 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392,\n", + " 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392,\n", + " 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392, 6392],\n", + " [15938, 15938, 15938, 26740, 22984, 22984, 22984, 22984, 22984,\n", + " 22984, 22984, 9891, 28343, 28343, 28343, 28343, 28343, 28343,\n", + " 28343, 28343, 20191, 20191, 20191, 20191, 20191, 29948, 29948,\n", + " 1776, 28231, 28231, 28231, 28231, 28231, 28231, 27518, 8752,\n", + " 8752, 8752, 12160, 12160, 12160, 12160, 12160, 12160, 31127,\n", + " 31127, 31127, 27126, 27126, 27126, 27126, 27126, 27126, 10049,\n", + " 7644, 7644, 7644, 7644, 7644, 7644, 3893, 3893, 27039,\n", + " 27039, 27039, 27039, 12160, 12160, 12160, 12160, 25077, 25077],\n", + " [ 4142, 4142, 4142, 4142, 4142, 4142, 7214, 7214, 20148,\n", + " 20148, 12004, 12004, 12004, 10655, 10655, 613, 613, 613,\n", + " 613, 613, 9258, 9258, 20428, 20428, 20428, 20428, 9811,\n", + " 9811, 700, 700, 700, 700, 700, 700, 700, 19765,\n", + " 19765, 19765, 19765, 17763, 17763, 17763, 17763, 17763, 5897,\n", + " 5897, 9063, 9063, 9063, 18912, 18912, 18912, 18912, 21053,\n", + " 21053, 13047, 13047, 2896, 2896, 2896, 2896, 11783, 4898,\n", + " 29767, 29767, 4898, 4898, 25309, 25309, 25309, 25309, 25309],\n", + " [ 3024, 3024, 3024, 3024, 3024, 3024, 26412, 26412, 26412,\n", + " 26412, 26412, 26412, 26412, 26412, 12840, 12840, 10354, 12840,\n", + " 15087, 15087, 15087, 15087, 15087, 30821, 30821, 30821, 30821,\n", + " 15087, 29395, 29395, 29395, 29395, 15087, 15087, 15087, 15087,\n", + " 13810, 13810, 3024, 3024, 3024, 3024, 3024, 26113, 26113,\n", + " 748, 748, 748, 12942, 12942, 12942, 12942, 12942, 31687,\n", + " 31687, 31687, 31687, 31687, 53, 53, 53, 53, 53,\n", + " 10520, 10520, 10520, 22954, 8840, 8840, 8840, 8840, 8840],\n", + " [ 8274, 22730, 22730, 22730, 31636, 29243, 29243, 29243, 29243,\n", + " 29243, 29243, 29243, 29243, 2787, 30159, 30159, 30159, 30159,\n", + " 30159, 30159, 30159, 30159, 27853, 27853, 27853, 27853, 27853,\n", + " 27853, 27853, 27853, 27853, 27853, 27853, 9257, 9257, 9257,\n", + " 31554, 31554, 31554, 25836, 25836, 25836, 25836, 25836, 25836,\n", + " 25836, 21844, 21844, 21844, 9903, 9903, 9903, 9903, 8274,\n", + " 17397, 17397, 17397, 8559, 1260, 1260, 1260, 1260, 1260,\n", + " 1260, 1260, 1260, 1260, 1260, 1260, 1260, 30254, 30254],\n", + " [31238, 1922, 1922, 26420, 12383, 12383, 31799, 31799, 31799,\n", + " 31799, 31799, 1526, 17804, 17804, 1526, 1526, 17804, 1526,\n", + " 1526, 1526, 25, 2122, 2122, 2122, 2122, 23907, 23907,\n", + " 23907, 26752, 26752, 20165, 20165, 20165, 20165, 20165, 20165,\n", + " 2626, 2626, 2626, 2626, 2626, 2626, 2626, 11814, 11814,\n", + " 11814, 11814, 11814, 14991, 12274, 12274, 12274, 12274, 29617,\n", + " 5510, 5510, 5510, 5510, 3630, 3630, 3630, 21651, 21651,\n", + " 21651, 21651, 19695, 19695, 11340, 11340, 21651, 5288, 12325]],\n", + " dtype=int32), 10.372833, 0.0]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [13:51<00:00, 1.88it/s, accuracy=0.374, cost=3.87]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.83it/s, accuracy=0.43, cost=3.42] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.20042004" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/35.byte-net.ipynb b/neural-machine-translation/35.byte-net.ipynb deleted file mode 100644 index efa96e6..0000000 --- a/neural-machine-translation/35.byte-net.ipynb +++ /dev/null @@ -1,453 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_normalization(x, epsilon=1e-8):\n", - " shape = x.get_shape()\n", - " tf.Variable(tf.zeros(shape = [int(shape[-1])]))\n", - " beta = tf.Variable(tf.zeros(shape = [int(shape[-1])]))\n", - " gamma = tf.Variable(tf.ones(shape = [int(shape[-1])]))\n", - " mean, variance = tf.nn.moments(x, axes=[len(shape) - 1], keep_dims=True)\n", - " x = (x - mean) / tf.sqrt(variance + epsilon)\n", - " return gamma * x + beta\n", - "\n", - "def conv1d(input_, output_channels, dilation = 1, filter_width = 1, causal = False):\n", - " w = tf.Variable(tf.random_normal([1, filter_width, int(input_.get_shape()[-1]), output_channels], stddev = 0.02))\n", - " b = tf.Variable(tf.zeros(shape = [output_channels]))\n", - " if causal:\n", - " padding = [[0, 0], [(filter_width - 1) * dilation, 0], [0, 0]]\n", - " padded = tf.pad(input_, padding)\n", - " input_expanded = tf.expand_dims(padded, dim = 1)\n", - " out = tf.nn.atrous_conv2d(input_expanded, w, rate = dilation, padding = 'VALID') + b\n", - " else:\n", - " input_expanded = tf.expand_dims(input_, dim = 1)\n", - " out = tf.nn.atrous_conv2d(input_expanded, w, rate = dilation, padding = 'SAME') + b\n", - " return tf.squeeze(out, [1])\n", - "\n", - "def bytenet_residual_block(input_, dilation, layer_no, \n", - " residual_channels, filter_width, \n", - " causal = True):\n", - " block_type = \"decoder\" if causal else \"encoder\"\n", - " block_name = \"bytenet_{}_layer_{}_{}\".format(block_type, layer_no, dilation)\n", - " with tf.variable_scope(block_name):\n", - " relu1 = tf.nn.relu(layer_normalization(input_))\n", - " conv1 = conv1d(relu1, residual_channels)\n", - " relu2 = tf.nn.relu(layer_normalization(conv1))\n", - " dilated_conv = conv1d(relu2, residual_channels, \n", - " dilation, filter_width,\n", - " causal = causal)\n", - " print(dilated_conv)\n", - " relu3 = tf.nn.relu(layer_normalization(dilated_conv))\n", - " conv2 = conv1d(relu3, 2 * residual_channels)\n", - " return input_ + conv2\n", - " \n", - "class ByteNet:\n", - " def __init__(self, from_vocab_size, to_vocab_size, channels, encoder_dilations,\n", - " decoder_dilations, encoder_filter_width, decoder_filter_width,\n", - " learning_rate = 0.001, beta1=0.5):\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " target_1 = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " embedding_channels = 2 * channels\n", - " w_source_embedding = tf.Variable(tf.random_normal([from_vocab_size, \n", - " embedding_channels], stddev = 0.02))\n", - " w_target_embedding = tf.Variable(tf.random_normal([to_vocab_size, \n", - " embedding_channels], stddev = 0.02))\n", - " source_embedding = tf.nn.embedding_lookup(w_source_embedding, self.X)\n", - " target_1_embedding = tf.nn.embedding_lookup(w_target_embedding, target_1)\n", - " curr_input = source_embedding\n", - " for layer_no, dilation in enumerate(encoder_dilations):\n", - " curr_input = bytenet_residual_block(curr_input, dilation, \n", - " layer_no, channels, \n", - " encoder_filter_width, \n", - " causal = False)\n", - " encoder_output = curr_input\n", - " combined_embedding = target_1_embedding + encoder_output\n", - " curr_input = combined_embedding\n", - " for layer_no, dilation in enumerate(decoder_dilations):\n", - " curr_input = bytenet_residual_block(curr_input, dilation, \n", - " layer_no, channels, \n", - " encoder_filter_width, \n", - " causal = False)\n", - " self.logits = conv1d(curr_input, to_vocab_size)\n", - " masks = tf.sequence_mask(self.Y_seq_len, maxlen_answer, dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "residual_channels = 128\n", - "encoder_dilations = [1,2,4,8,16,1,2,4,8,16]\n", - "decoder_dilations = [1,2,4,8,16,1,2,4,8,16]\n", - "encoder_filter_width = 3\n", - "decoder_filter_width = 3\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :19: calling expand_dims (from tensorflow.python.ops.array_ops) with dim is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use the `axis` argument instead\n", - "Tensor(\"bytenet_encoder_layer_0_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_1_2/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_2_4/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_3_8/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_4_16/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_5_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_6_2/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_7_4/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_8_8/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_9_16/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_0_1_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_1_2_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_2_4_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_3_8_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_4_16_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_5_1_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_6_2_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_7_4_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_8_8_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"bytenet_encoder_layer_9_16_1/Squeeze_1:0\", shape=(?, ?, 128), dtype=float32)\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = ByteNet(len(dictionary_from), len(dictionary_to), \n", - " residual_channels, encoder_dilations, decoder_dilations,\n", - " encoder_filter_width,decoder_filter_width)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.195900, avg accuracy: 0.882818\n", - "epoch: 2, avg loss: 0.739483, avg accuracy: 0.912555\n", - "epoch: 3, avg loss: 0.714763, avg accuracy: 0.914545\n", - "epoch: 4, avg loss: 0.718540, avg accuracy: 0.913191\n", - "epoch: 5, avg loss: 0.698483, avg accuracy: 0.915082\n", - "epoch: 6, avg loss: 0.672405, avg accuracy: 0.917509\n", - "epoch: 7, avg loss: 0.652911, avg accuracy: 0.919900\n", - "epoch: 8, avg loss: 0.627932, avg accuracy: 0.922464\n", - "epoch: 9, avg loss: 0.588484, avg accuracy: 0.927682\n", - "epoch: 10, avg loss: 0.515681, avg accuracy: 0.936664\n", - "epoch: 11, avg loss: 0.458144, avg accuracy: 0.941473\n", - "epoch: 12, avg loss: 0.373364, avg accuracy: 0.951845\n", - "epoch: 13, avg loss: 0.301437, avg accuracy: 0.961718\n", - "epoch: 14, avg loss: 0.227614, avg accuracy: 0.973909\n", - "epoch: 15, avg loss: 0.169901, avg accuracy: 0.984582\n", - "epoch: 16, avg loss: 0.123547, avg accuracy: 0.995145\n", - "epoch: 17, avg loss: 0.080864, avg accuracy: 1.005564\n", - "epoch: 18, avg loss: 0.046408, avg accuracy: 1.015191\n", - "epoch: 19, avg loss: 0.029291, avg accuracy: 1.019073\n", - "epoch: 20, avg loss: 0.015275, avg accuracy: 1.022409\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: i 've actually chosen to take a different kind of risk .\n", - "REAL ANSWER: thật ra tôi đã chọn một loại mạo hiểm khác .\n", - "PREDICTED ANSWER: thật ra tôi đã chọn một loại mạo hiểm khác . \n", - "\n", - "row 2\n", - "QUESTION: this is very promising .\n", - "REAL ANSWER: đây thật sự là một điều rất hứa hẹn ,\n", - "PREDICTED ANSWER: đây thật sự là một điều rất hứa hẹn , \n", - "\n", - "row 3\n", - "QUESTION: our models have hundreds of thousands of grid boxes calculating hundreds of variables each , on minute timescales .\n", - "REAL ANSWER: mô hình của chúng tôi gồm hàng trăm ngàn thùng xếp chồng tính toán với hàng trăm biến số trong thời gian cực ngắn .\n", - "PREDICTED ANSWER: mô hình của chúng tôi gồm hàng trăm ngàn thùng xếp chồng tính toán với hàng trăm biến số trong thời gian cực ngắn . \n", - "\n", - "row 4\n", - "QUESTION: this is why we are here .\n", - "REAL ANSWER: đây là lí do tại sao chúng tôi tới đây .\n", - "PREDICTED ANSWER: đây là lí do tại sao chúng tôi tới đây . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/35.gru-birnn-seq2seq-contrib-beam-bahdanau.ipynb b/neural-machine-translation/35.gru-birnn-seq2seq-contrib-beam-bahdanau.ipynb new file mode 100644 index 0000000..5b9b45e --- /dev/null +++ b/neural-machine-translation/35.gru-birnn-seq2seq-contrib-beam-bahdanau.ipynb @@ -0,0 +1,818 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[18140, 18140, 18140, 18140, 18538, 18538, 25642, 25642, 25642,\n", + " 25642, 25642, 25642, 5822, 13819, 7190, 5822, 5822, 5822,\n", + " 5822, 5822, 5822, 5822, 14383, 14383, 14383, 14383, 14383,\n", + " 14383, 31192, 31192, 31192, 31192, 12526, 12526, 12526, 12526,\n", + " 12526, 8800, 8800, 8800, 28067, 28067, 28067, 6796, 6796,\n", + " 6796, 6796, 6796, 6796, 6796, 6796, 6796, 6796, 16628,\n", + " 20674, 20674, 20674, 20674, 20674, 16628, 5393, 5393, 5393,\n", + " 5393, 5393, 3229, 3229, 3229, 3229, 3229, 3229, 3229],\n", + " [17380, 1074, 14817, 14817, 14817, 14817, 14817, 14817, 14817,\n", + " 1122, 1122, 16566, 16566, 16566, 16566, 16566, 16566, 16566,\n", + " 24158, 28048, 28048, 23610, 23610, 23610, 21941, 21941, 11793,\n", + " 11793, 11793, 19250, 19250, 19250, 19250, 12709, 12709, 12709,\n", + " 319, 24797, 14270, 14270, 30986, 30986, 30986, 11559, 11559,\n", + " 11559, 11559, 11559, 11559, 29029, 29029, 29029, 29029, 27680,\n", + " 27680, 27680, 16100, 16100, 16100, 19772, 19772, 8395, 8395,\n", + " 8395, 8395, 30167, 15547, 15547, 9121, 9121, 9121, 9121],\n", + " [ 3383, 3383, 29262, 16672, 18901, 18901, 18901, 18901, 25906,\n", + " 15420, 15420, 1544, 25001, 25001, 10331, 10331, 10331, 13048,\n", + " 18915, 18915, 24900, 24900, 24900, 24900, 24900, 1091, 1091,\n", + " 1091, 29541, 29541, 29541, 9739, 9739, 9739, 9739, 8630,\n", + " 24178, 24178, 18954, 18954, 18954, 28012, 28012, 28012, 4909,\n", + " 22189, 22189, 22189, 29735, 29735, 29735, 29735, 20393, 20393,\n", + " 20393, 3663, 3663, 3663, 3663, 3663, 3663, 2790, 2790,\n", + " 2790, 2790, 7288, 7288, 7288, 7288, 7288, 3852, 3852],\n", + " [23769, 23769, 23769, 23769, 2938, 2938, 2938, 2938, 29517,\n", + " 29517, 29019, 29019, 29019, 29019, 29019, 29019, 29019, 29019,\n", + " 2081, 7922, 7922, 7922, 28400, 28400, 28400, 28400, 31007,\n", + " 21846, 21846, 21846, 21846, 30198, 30198, 30198, 5, 5,\n", + " 5, 5, 5, 5, 5, 5, 5, 16460, 22531,\n", + " 13416, 13416, 13416, 13416, 13416, 13416, 13416, 10, 13863,\n", + " 13863, 13863, 13863, 13863, 13416, 12854, 12854, 12854, 12854,\n", + " 12854, 30945, 30815, 30815, 30815, 30815, 30945, 30945, 18687],\n", + " [ 4022, 4022, 19580, 20005, 20005, 20005, 20005, 18837, 18837,\n", + " 7219, 7219, 5373, 18673, 23298, 18160, 18160, 18160, 23926,\n", + " 23926, 23926, 23926, 19933, 19933, 19933, 19933, 19933, 29133,\n", + " 29133, 29133, 9655, 29133, 9655, 9655, 9655, 3871, 3871,\n", + " 11556, 5383, 22365, 22365, 22365, 22365, 22365, 22365, 1581,\n", + " 1581, 1581, 28414, 28414, 28414, 2171, 6330, 6330, 6330,\n", + " 6330, 1918, 1918, 9071, 9071, 27333, 27333, 22457, 22457,\n", + " 22457, 22457, 22457, 12372, 12372, 12372, 12372, 12372, 28203],\n", + " [12301, 12301, 12301, 12301, 17963, 17963, 16843, 16843, 16843,\n", + " 16843, 16843, 7101, 7101, 16843, 7101, 7101, 7101, 7101,\n", + " 7101, 7101, 7101, 16843, 402, 402, 402, 24149, 24149,\n", + " 19569, 19569, 19569, 19569, 19569, 19569, 7963, 7963, 7963,\n", + " 7963, 18890, 18890, 18890, 9159, 9159, 9159, 9159, 9159,\n", + " 9159, 9159, 9159, 9159, 8674, 8674, 8674, 8674, 8674,\n", + " 8674, 8674, 8674, 8674, 8674, 20037, 20037, 26776, 26776,\n", + " 26776, 26776, 26776, 26776, 18201, 18201, 18201, 18201, 26939],\n", + " [20181, 23988, 1683, 1683, 1683, 1683, 1683, 17091, 16021,\n", + " 16021, 16021, 16021, 10841, 10841, 27910, 27910, 27910, 27910,\n", + " 4626, 4626, 4626, 4626, 26328, 26328, 26328, 10217, 10217,\n", + " 1135, 1135, 1135, 1135, 1135, 1135, 28421, 28421, 4415,\n", + " 4415, 4415, 4415, 4415, 4415, 4415, 4415, 4415, 4415,\n", + " 4415, 16277, 16277, 1424, 1424, 1424, 1424, 1424, 1424,\n", + " 27993, 22766, 22766, 22766, 9386, 30514, 30514, 30514, 27356,\n", + " 1963, 1963, 1963, 1963, 17758, 10770, 10770, 10770, 10770],\n", + " [31079, 31079, 31079, 25450, 25450, 25450, 22725, 8819, 8819,\n", + " 8819, 29384, 29384, 3183, 12071, 12071, 12071, 12071, 12071,\n", + " 12071, 14257, 17658, 10729, 10729, 10729, 10729, 10729, 10729,\n", + " 10729, 2764, 2764, 2764, 10632, 10632, 10632, 10632, 10632,\n", + " 10632, 10632, 16977, 16977, 16977, 16977, 16977, 16977, 8831,\n", + " 16977, 8831, 8831, 8831, 8831, 15007, 15007, 15007, 23132,\n", + " 29448, 29448, 29448, 29448, 15415, 15415, 15415, 15415, 15415,\n", + " 15415, 15415, 15415, 15415, 15415, 15415, 15415, 2480, 2480],\n", + " [30349, 29392, 18761, 29738, 29738, 29738, 29738, 29738, 29738,\n", + " 29738, 5029, 19882, 19882, 19882, 19882, 19882, 19882, 16846,\n", + " 11590, 11590, 11590, 11590, 4653, 4653, 4653, 4653, 4653,\n", + " 29297, 29297, 6060, 6060, 6060, 6060, 6060, 6060, 29133,\n", + " 29133, 29133, 29133, 29133, 29133, 3488, 3488, 17236, 19089,\n", + " 17236, 5137, 5137, 7242, 5137, 5137, 5137, 5137, 19882,\n", + " 19882, 19882, 19882, 19882, 2795, 2795, 2795, 2795, 2795,\n", + " 2795, 2795, 21894, 2795, 21894, 12287, 21894, 13759, 13759],\n", + " [25731, 10124, 22909, 22909, 22909, 22909, 22909, 9419, 29090,\n", + " 29090, 29090, 29090, 29090, 1175, 3680, 3680, 17309, 17309,\n", + " 17309, 17309, 6905, 26935, 26935, 26935, 26935, 16614, 16662,\n", + " 26935, 16662, 16662, 16662, 16662, 10365, 10365, 10286, 10286,\n", + " 10286, 10286, 25705, 5662, 5662, 5662, 5662, 5662, 30029,\n", + " 30029, 30029, 6533, 6533, 6533, 6533, 6533, 4685, 4685,\n", + " 18636, 18636, 18636, 18636, 30850, 12711, 12711, 1500, 9952,\n", + " 9952, 1500, 15338, 15338, 15338, 15338, 15338, 10917, 10917]],\n", + " dtype=int32), 10.375526, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [13:49<00:00, 1.88it/s, accuracy=0.379, cost=3.76]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.89it/s, accuracy=0.43, cost=3.19] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.1784567" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/36.estimator.ipynb b/neural-machine-translation/36.estimator.ipynb deleted file mode 100644 index e8416db..0000000 --- a/neural-machine-translation/36.estimator.ipynb +++ /dev/null @@ -1,285 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from tensorflow.python.layers.core import Dense\n", - "import collections" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words):\n", - " count = [['GO', 0], ['PAD', 1], ['EOS', 2], ['UNK', 3]]\n", - " count.extend(collections.Counter(words).most_common(n_words - 1))\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open('from.txt', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')\n", - "with open('to.txt', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, batch_size,\n", - " from_dict_size, to_dict_size, grad_clip=5.0):\n", - " self.size_layer = size_layer\n", - " self.num_layers = num_layers\n", - " self.embedded_size = embedded_size\n", - " self.grad_clip = grad_clip\n", - " self.from_dict_size = from_dict_size\n", - " self.to_dict_size = to_dict_size\n", - " self.batch_size = batch_size\n", - " self.model = tf.estimator.Estimator(self.model_fn)\n", - " \n", - " def lstm_cell(self, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(self.size_layer, reuse=reuse)\n", - " \n", - " def seq2seq(self, x_dict, reuse):\n", - " x = x_dict['x']\n", - " x_seq_len = x_dict['x_len']\n", - " with tf.variable_scope('encoder', reuse=reuse):\n", - " encoder_embedding = tf.get_variable('encoder_embedding') if reuse else tf.get_variable('encoder_embedding', \n", - " [self.from_dict_size, self.embedded_size], \n", - " tf.float32, tf.random_uniform_initializer(-1.0, 1.0))\n", - " _, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([self.lstm_cell() for _ in range(self.num_layers)]), \n", - " inputs = tf.nn.embedding_lookup(encoder_embedding, x),\n", - " sequence_length = x_seq_len,\n", - " dtype = tf.float32)\n", - " encoder_state = tuple(encoder_state[-1] for _ in range(self.num_layers))\n", - " if not reuse:\n", - " y = x_dict['y']\n", - " y_seq_len = x_dict['y_len']\n", - " with tf.variable_scope('decoder', reuse=reuse):\n", - " decoder_embedding = tf.get_variable(\n", - " 'decoder_embedding', [self.to_dict_size, self.embedded_size], tf.float32,\n", - " tf.random_uniform_initializer(-1.0, 1.0))\n", - " helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, y),\n", - " sequence_length = y_seq_len,\n", - " time_major = False)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([self.lstm_cell() for _ in range(self.num_layers)]),\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = tf.layers.Dense(self.to_dict_size))\n", - " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(y_seq_len))\n", - " return decoder_output.rnn_output\n", - " else:\n", - " with tf.variable_scope('decoder', reuse=reuse):\n", - " helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = tf.get_variable('decoder_embedding'),\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [tf.shape(x)[0]]),\n", - " end_token = EOS)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell(\n", - " [self.lstm_cell(reuse=True) for _ in range(self.num_layers)]),\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = tf.layers.Dense(self.to_dict_size, _reuse=reuse))\n", - " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(x_seq_len))\n", - " return decoder_output.sample_id\n", - " \n", - " def model_fn(self, features, labels, mode):\n", - " logits = self.seq2seq(features, reuse=False)\n", - " predictions = self.seq2seq(features, reuse=True)\n", - " if mode == tf.estimator.ModeKeys.PREDICT:\n", - " return tf.estimator.EstimatorSpec(mode, predictions=predictions)\n", - " y_seq_len = features['y_len']\n", - " masks = tf.sequence_mask(y_seq_len, tf.reduce_max(y_seq_len), dtype=tf.float32)\n", - " loss_op = tf.contrib.seq2seq.sequence_loss(logits = logits, targets = labels, weights = masks)\n", - " params = tf.trainable_variables()\n", - " gradients = tf.gradients(loss_op, params)\n", - " clipped_gradients, _ = tf.clip_by_global_norm(gradients, self.grad_clip)\n", - " train_op = tf.train.AdamOptimizer().apply_gradients(zip(clipped_gradients, params),\n", - " global_step=tf.train.get_global_step())\n", - " acc_op = tf.metrics.accuracy(labels=labels, predictions=predictions)\n", - " estim_specs = tf.estimator.EstimatorSpec(\n", - " mode = mode,\n", - " predictions = predictions,\n", - " loss = loss_op,\n", - " train_op = train_op,\n", - " eval_metric_ops = {'accuracy': acc_op})\n", - " return estim_specs" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 256\n", - "batch_size = len(text_from)\n", - "model = Chatbot(size_layer, num_layers, embedded_size, batch_size,\n", - " vocabulary_size_from + 4, vocabulary_size_to + 4)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return np.array(padded_seqs).astype(np.int32), np.array(seq_lens).astype(np.int32)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "batch_x, seq_x = pad_sentence_batch(X, PAD)\n", - "batch_y, seq_y = pad_sentence_batch(Y, PAD)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "input_fn = tf.estimator.inputs.numpy_input_fn(\n", - " x={'x':batch_x, 'x_len':seq_x, 'y':batch_y, 'y_len':seq_y}, y=batch_y,\n", - " batch_size=batch_size, num_epochs=100, shuffle=False)\n", - "model.model.train(input_fn)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/36.gru-birnn-seq2seq-contrib-beam-luong.ipynb b/neural-machine-translation/36.gru-birnn-seq2seq-contrib-beam-luong.ipynb new file mode 100644 index 0000000..d5ef386 --- /dev/null +++ b/neural-machine-translation/36.gru-birnn-seq2seq-contrib-beam-luong.ipynb @@ -0,0 +1,822 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[29054, 17002, 17002, 17002, 17002, 29188, 29188, 29188, 29188,\n", + " 29188, 4369, 4369, 4369, 4369, 4369, 26533, 26533, 26533,\n", + " 26533, 26533, 26533, 15700, 15700, 23752, 23752, 23752, 23752,\n", + " 23752, 17901, 17901, 17901, 14067, 14067, 4322, 4322, 4322,\n", + " 4322, 4322, 4322, 4322, 4322, 4322, 4322, 4322, 4322,\n", + " 4322, 4322, 4322, 27402, 4322, 4322, 4322, 4322, 26355,\n", + " 8794, 4322, 4322, 4322, 4322, 4322, 4322, 4322, 26355,\n", + " 4322, 26355, 4322, 26355, 4322, 4322, 4322, 4322, 26355],\n", + " [21304, 11233, 11233, 310, 310, 310, 310, 5081, 5081,\n", + " 5081, 5081, 10446, 10446, 10446, 2600, 2600, 2600, 31169,\n", + " 31169, 31169, 2008, 2008, 2008, 2008, 28528, 28528, 25165,\n", + " 25165, 25165, 25165, 25165, 25165, 25165, 31786, 31786, 31786,\n", + " 20197, 20197, 20197, 30103, 30103, 30103, 30103, 30103, 30103,\n", + " 19837, 19837, 19837, 19370, 19370, 19370, 19370, 19370, 19370,\n", + " 19370, 19370, 30038, 30038, 30038, 30038, 30038, 7650, 7650,\n", + " 7650, 12404, 12404, 12404, 12404, 11773, 11773, 11773, 22618],\n", + " [21904, 30738, 12727, 12727, 12727, 21904, 15048, 15048, 15048,\n", + " 15048, 15048, 15048, 19380, 19380, 19380, 24681, 19877, 19877,\n", + " 19877, 19877, 19877, 27423, 27423, 19380, 19380, 19380, 19380,\n", + " 19380, 19380, 19380, 20462, 20462, 20462, 20462, 20462, 10701,\n", + " 10701, 10701, 10701, 10701, 10701, 29174, 29174, 29174, 13367,\n", + " 13367, 13367, 13367, 13367, 22744, 22744, 28435, 28435, 28435,\n", + " 19869, 19869, 19869, 19869, 19869, 18273, 5522, 3681, 3681,\n", + " 3681, 15420, 15420, 9093, 16951, 16951, 16951, 16397, 16397],\n", + " [29150, 29150, 29150, 29150, 30315, 29150, 30315, 30315, 10361,\n", + " 10361, 10361, 10361, 11234, 11234, 11234, 11234, 11234, 11234,\n", + " 15361, 15361, 15361, 15361, 15361, 15361, 23738, 23738, 23738,\n", + " 23738, 7933, 7933, 15460, 7933, 17521, 17521, 17521, 17521,\n", + " 18394, 17521, 18394, 18394, 18394, 18394, 18394, 18394, 28116,\n", + " 28116, 28116, 28275, 28275, 28275, 10213, 10213, 315, 315,\n", + " 315, 315, 315, 15606, 315, 315, 315, 315, 315,\n", + " 15460, 15460, 315, 12982, 12982, 5465, 12982, 12982, 12982],\n", + " [ 2546, 2546, 2546, 18600, 18600, 21638, 21638, 21638, 21638,\n", + " 15413, 15413, 8068, 8068, 8068, 8068, 8068, 8068, 8691,\n", + " 24592, 8691, 8691, 8691, 8691, 8691, 8691, 3531, 3531,\n", + " 17984, 15211, 15211, 15211, 5236, 15211, 5236, 5236, 5236,\n", + " 22853, 22853, 25234, 25234, 25234, 25234, 25234, 25234, 28436,\n", + " 28436, 11506, 11506, 11506, 11506, 11506, 26154, 26154, 23428,\n", + " 23428, 23428, 19163, 19163, 19163, 19163, 19163, 4930, 4930,\n", + " 4930, 4930, 27581, 27581, 27581, 475, 475, 30738, 30738],\n", + " [16042, 16042, 16042, 16042, 23143, 23143, 23143, 23143, 23143,\n", + " 8320, 8320, 8320, 8320, 11501, 11501, 11501, 11501, 11501,\n", + " 5446, 24461, 24461, 24461, 24461, 10108, 18449, 15210, 15210,\n", + " 15210, 24461, 24461, 24461, 26260, 26260, 26260, 26260, 26260,\n", + " 17215, 5674, 595, 595, 595, 14899, 14899, 21855, 21855,\n", + " 21855, 13366, 13366, 13366, 28961, 28961, 28961, 28961, 13313,\n", + " 13313, 13313, 13313, 13313, 13313, 22509, 22509, 23536, 23536,\n", + " 23536, 13474, 13474, 13474, 3606, 3606, 27964, 27964, 2910],\n", + " [31672, 15751, 15751, 15751, 15751, 29982, 29982, 29982, 29982,\n", + " 29982, 29982, 6691, 6691, 6691, 6691, 19202, 804, 11545,\n", + " 11545, 11545, 29270, 2926, 2926, 2926, 5451, 5451, 5451,\n", + " 5451, 31389, 31389, 31389, 31389, 31389, 11257, 11257, 11257,\n", + " 11257, 11257, 11257, 24667, 24667, 11257, 11257, 11257, 11257,\n", + " 11257, 11257, 11257, 13073, 22261, 4504, 4504, 4504, 4504,\n", + " 4504, 4504, 4504, 4504, 4504, 7543, 3411, 3411, 3411,\n", + " 3411, 30503, 29420, 5057, 5057, 5057, 3411, 3411, 3411],\n", + " [ 9058, 9058, 22868, 22868, 15289, 15289, 15289, 8739, 8739,\n", + " 8739, 29285, 15289, 29285, 29285, 29285, 29285, 29285, 11970,\n", + " 28560, 28560, 11970, 7606, 7421, 7421, 7421, 7421, 7421,\n", + " 7421, 15289, 15289, 15289, 15289, 15289, 15058, 15058, 15058,\n", + " 15058, 23060, 2101, 2101, 21165, 21165, 21165, 23172, 23172,\n", + " 23172, 23172, 6235, 6235, 6235, 6235, 6235, 6235, 21206,\n", + " 21206, 21206, 21206, 21206, 21206, 23323, 7110, 7110, 7110,\n", + " 7110, 7110, 7110, 7110, 24070, 4384, 24070, 24070, 11008],\n", + " [26688, 26688, 26688, 26688, 26688, 26688, 15001, 15001, 15001,\n", + " 15001, 15001, 15001, 15001, 15001, 29969, 29969, 29969, 29969,\n", + " 15062, 15062, 15062, 8506, 8506, 8506, 8506, 8506, 21168,\n", + " 21168, 21168, 21168, 21168, 21168, 21168, 3161, 20092, 20092,\n", + " 3161, 15116, 15174, 15174, 15174, 15174, 15174, 15174, 15174,\n", + " 15174, 15174, 3147, 3147, 3147, 3147, 3147, 5803, 5803,\n", + " 5803, 5803, 5803, 20483, 6938, 6938, 6938, 6938, 1853,\n", + " 1853, 9980, 9980, 9980, 21712, 21712, 21712, 21712, 26688],\n", + " [28990, 22146, 22146, 22146, 18968, 18968, 4045, 4045, 4045,\n", + " 4045, 4045, 4045, 4045, 4045, 13103, 13103, 13103, 13103,\n", + " 13103, 2802, 2802, 2802, 19216, 19216, 19216, 19216, 19216,\n", + " 19216, 925, 925, 29893, 17624, 17624, 17624, 22910, 17624,\n", + " 22910, 17624, 22910, 17624, 22910, 22910, 3858, 4879, 4879,\n", + " 4879, 3858, 3858, 11849, 11849, 11849, 11849, 17702, 17702,\n", + " 17702, 17702, 19875, 19875, 19875, 19875, 19875, 19875, 10799,\n", + " 10799, 10799, 10799, 16468, 4941, 4941, 4941, 18317, 18317]],\n", + " dtype=int32), 10.373235, 0.0]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [13:56<00:00, 1.87it/s, accuracy=0.293, cost=4.43]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.82it/s, accuracy=0.382, cost=3.81]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0557322" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/37.capsule-lstm-seq2seq-greedy.ipynb b/neural-machine-translation/37.capsule-lstm-seq2seq-greedy.ipynb deleted file mode 100644 index 6637c1e..0000000 --- a/neural-machine-translation/37.capsule-lstm-seq2seq-greedy.ipynb +++ /dev/null @@ -1,612 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lines = open('movie_lines.txt', encoding='utf-8', errors='ignore').read().split('\\n')\n", - "conv_lines = open('movie_conversations.txt', encoding='utf-8', errors='ignore').read().split('\\n')\n", - "\n", - "id2line = {}\n", - "for line in lines:\n", - " _line = line.split(' +++$+++ ')\n", - " if len(_line) == 5:\n", - " id2line[_line[0]] = _line[4]\n", - " \n", - "convs = [ ]\n", - "for line in conv_lines[:-1]:\n", - " _line = line.split(' +++$+++ ')[-1][1:-1].replace(\"'\",\"\").replace(\" \",\"\")\n", - " convs.append(_line.split(','))\n", - " \n", - "questions = []\n", - "answers = []\n", - "\n", - "for conv in convs:\n", - " for i in range(len(conv)-1):\n", - " questions.append(id2line[conv[i]])\n", - " answers.append(id2line[conv[i+1]])\n", - " \n", - "def clean_text(text):\n", - " text = text.lower()\n", - " text = re.sub(r\"i'm\", \"i am\", text)\n", - " text = re.sub(r\"he's\", \"he is\", text)\n", - " text = re.sub(r\"she's\", \"she is\", text)\n", - " text = re.sub(r\"it's\", \"it is\", text)\n", - " text = re.sub(r\"that's\", \"that is\", text)\n", - " text = re.sub(r\"what's\", \"that is\", text)\n", - " text = re.sub(r\"where's\", \"where is\", text)\n", - " text = re.sub(r\"how's\", \"how is\", text)\n", - " text = re.sub(r\"\\'ll\", \" will\", text)\n", - " text = re.sub(r\"\\'ve\", \" have\", text)\n", - " text = re.sub(r\"\\'re\", \" are\", text)\n", - " text = re.sub(r\"\\'d\", \" would\", text)\n", - " text = re.sub(r\"\\'re\", \" are\", text)\n", - " text = re.sub(r\"won't\", \"will not\", text)\n", - " text = re.sub(r\"can't\", \"cannot\", text)\n", - " text = re.sub(r\"n't\", \" not\", text)\n", - " text = re.sub(r\"n'\", \"ng\", text)\n", - " text = re.sub(r\"'bout\", \"about\", text)\n", - " text = re.sub(r\"'til\", \"until\", text)\n", - " text = re.sub(r\"[-()\\\"#/@;:<>{}`+=~|.!?,]\", \"\", text)\n", - " return ' '.join([i.strip() for i in filter(None, text.split())])\n", - "\n", - "clean_questions = []\n", - "for question in questions:\n", - " clean_questions.append(clean_text(question))\n", - " \n", - "clean_answers = [] \n", - "for answer in answers:\n", - " clean_answers.append(clean_text(answer))\n", - " \n", - "min_line_length = 2\n", - "max_line_length = 5\n", - "short_questions_temp = []\n", - "short_answers_temp = []\n", - "\n", - "i = 0\n", - "for question in clean_questions:\n", - " if len(question.split()) >= min_line_length and len(question.split()) <= max_line_length:\n", - " short_questions_temp.append(question)\n", - " short_answers_temp.append(clean_answers[i])\n", - " i += 1\n", - "\n", - "short_questions = []\n", - "short_answers = []\n", - "\n", - "i = 0\n", - "for answer in short_answers_temp:\n", - " if len(answer.split()) >= min_line_length and len(answer.split()) <= max_line_length:\n", - " short_answers.append(answer)\n", - " short_questions.append(short_questions_temp[i])\n", - " i += 1\n", - "\n", - "question_test = short_questions[500:550]\n", - "answer_test = short_answers[500:550]\n", - "short_questions = short_questions[:500]\n", - "short_answers = short_answers[:500]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_from = ' '.join(short_questions+question_test).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])\n", - "print('filtered vocab size:',len(dictionary_from))\n", - "print(\"% of vocab used: {}%\".format(round(len(dictionary_from)/vocabulary_size_from,4)*100))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_to = ' '.join(short_answers+answer_test).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab from size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])\n", - "print('filtered vocab size:',len(dictionary_to))\n", - "print(\"% of vocab used: {}%\".format(round(len(dictionary_to)/vocabulary_size_to,4)*100))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(short_answers)):\n", - " short_answers[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def squash(X, epsilon = 1e-9):\n", - " vec_squared_norm = tf.reduce_sum(tf.square(X), -2, keep_dims=True)\n", - " scalar_factor = vec_squared_norm / (1 + vec_squared_norm) / tf.sqrt(vec_squared_norm + epsilon)\n", - " return scalar_factor * X\n", - "\n", - "def conv_layer(X, num_output, num_vector, kernel=None, stride=None):\n", - " capsules = tf.layers.conv1d(X, num_output * num_vector,\n", - " kernel, stride, padding=\"VALID\", activation=tf.nn.relu)\n", - " capsules = tf.reshape(capsules, (tf.shape(X)[0], -1, num_vector, 1))\n", - " return squash(capsules)\n", - "\n", - "def routing(X, b_IJ, seq_len, dimension_out, routing_times = 2):\n", - " shape_X = tf.shape(X)[1]\n", - " w = tf.Variable(tf.truncated_normal([1, 1, seq_len, 4, dimension_out//2], stddev=1e-1))\n", - " X = tf.tile(X, [1, 1, seq_len, 1, dimension_out])\n", - " w = tf.tile(w, [tf.shape(X)[0], tf.shape(X)[1], 1, 1, routing_times])\n", - " print('X shape: %s, w shape: %s'%(str(X.shape), str(w.shape)))\n", - " u_hat = tf.matmul(w, X, transpose_a=True)\n", - " u_hat_stopped = tf.stop_gradient(u_hat)\n", - " print(u_hat,u_hat_stopped)\n", - " for i in range(routing_times):\n", - " c_IJ = tf.nn.softmax(b_IJ, dim=2)\n", - " print(c_IJ)\n", - " if i == routing_times - 1:\n", - " s_J = tf.multiply(c_IJ, u_hat)\n", - " s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)\n", - " v_J = squash(s_J)\n", - " else:\n", - " s_J = tf.multiply(c_IJ, u_hat_stopped)\n", - " s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)\n", - " v_J = squash(s_J)\n", - " v_J_tiled = tf.tile(v_J, [1, shape_X, 1, 1, 1])\n", - " u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)\n", - " b_IJ += u_produce_v\n", - " return v_J\n", - "\n", - "def fully_conn_layer(X, num_output, dimension_out):\n", - " batch_size = tf.shape(X)[1]\n", - " X_ = tf.reshape(X, shape=(tf.shape(X)[0], -1, 1, X.shape[-2].value, 1))\n", - " b_IJ = tf.fill([tf.shape(X)[0], tf.shape(X)[1], num_output, 1, 1], 0.0)\n", - " capsules = routing(X_, b_IJ, num_output, dimension_out, routing_times = 2)\n", - " capsules = tf.squeeze(capsules, axis=1)\n", - " return capsules\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, seq_len, maxlen,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size,\n", - " kernels=[2, 4, 4],strides=[3,2,1],epsilon=1e-8):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, maxlen])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, self.X)\n", - " \n", - " results = []\n", - " for i in range(len(kernels)):\n", - " conv = tf.layers.conv1d(encoder_embedded, filters=32,\n", - " kernel_size=kernels[i], strides=strides[i],\n", - " padding='VALID')\n", - " caps1 = conv_layer(conv, 4, 4, kernels[i], strides[i])\n", - " caps2 = fully_conn_layer(caps1,seq_len,embedded_size)\n", - " print(caps2)\n", - " v_length = tf.sqrt(tf.reduce_sum(tf.square(caps2),axis=2, keep_dims=True) + epsilon)[:,:,0,:]\n", - " print('output shape: %s\\n'%(str(v_length.shape)))\n", - " results.append(v_length)\n", - " results = tf.concat(results,1)\n", - " self.X_seq_len = tf.fill([batch_size], seq_len * len(kernels))\n", - " \n", - " _, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " inputs = results,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(size_layer)])\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embedding,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 3 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 1e-4\n", - "batch_size = 16\n", - "epoch = 20\n", - "maxlen = 10" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, 5, maxlen, len(dictionary_from), \n", - " len(dictionary_to), learning_rate, batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(short_questions, dictionary_from)\n", - "Y = str_idx(short_answers, dictionary_to)\n", - "X_test = str_idx(question_test, dictionary_from)\n", - "Y_test = str_idx(answer_test, dictionary_from)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def pad_sentence_batch_static(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(short_questions), batch_size):\n", - " index = min(k + batch_size, len(short_questions))\n", - " batch_x, _ = pad_sentence_batch_static(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD)\n", - " predicted, loss, _, accuracy = sess.run([model.predicting_ids, model.cost, \n", - " model.optimizer, model.accuracy], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(short_questions) / batch_size)\n", - " total_accuracy /= (len(short_questions) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "batch_x, _ = pad_sentence_batch_static(X_test[:batch_size], PAD)\n", - "batch_y, seq_y = pad_sentence_batch(Y_test[:batch_size], PAD)\n", - "predicted = sess.run(model.predicting_ids, feed_dict={model.X:batch_x})\n", - "\n", - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```text\n", - "epoch: 1, avg loss: 5.375520, avg accuracy: 0.239554\n", - "epoch: 2, avg loss: 4.713192, avg accuracy: 0.262143\n", - "epoch: 3, avg loss: 4.407076, avg accuracy: 0.277683\n", - "epoch: 4, avg loss: 4.114979, avg accuracy: 0.294326\n", - "epoch: 5, avg loss: 3.864636, avg accuracy: 0.302303\n", - "epoch: 6, avg loss: 3.666699, avg accuracy: 0.312473\n", - "epoch: 7, avg loss: 3.370519, avg accuracy: 0.343737\n", - "epoch: 8, avg loss: 3.116906, avg accuracy: 0.372554\n", - "epoch: 9, avg loss: 2.851404, avg accuracy: 0.403916\n", - "epoch: 10, avg loss: 2.587815, avg accuracy: 0.466819\n", - "epoch: 11, avg loss: 2.343917, avg accuracy: 0.517899\n", - "epoch: 12, avg loss: 2.135444, avg accuracy: 0.561270\n", - "epoch: 13, avg loss: 1.993201, avg accuracy: 0.600130\n", - "epoch: 14, avg loss: 1.876830, avg accuracy: 0.617571\n", - "epoch: 15, avg loss: 1.788884, avg accuracy: 0.632134\n", - "epoch: 16, avg loss: 1.716430, avg accuracy: 0.641549\n", - "epoch: 17, avg loss: 1.649575, avg accuracy: 0.655224\n", - "epoch: 18, avg loss: 1.631061, avg accuracy: 0.656174\n", - "epoch: 19, avg loss: 1.596259, avg accuracy: 0.656654\n", - "epoch: 20, avg loss: 1.615984, avg accuracy: 0.655007\n", - "row 1\n", - "QUESTION: plenty twentysix minutes\n", - "REAL ANSWER: we are not leaving\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 2\n", - "QUESTION: no why\n", - "REAL ANSWER: is david acting strangely\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 3\n", - "QUESTION: cool pictures you a fan\n", - "REAL ANSWER: yeah i guess\n", - "PREDICTED ANSWER: i am so ashamed \n", - "\n", - "row 4\n", - "QUESTION: ttthanks alice\n", - "REAL ANSWER: earth to alice\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 1\n", - "QUESTION: but david\n", - "REAL ANSWER: is here that\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 2\n", - "QUESTION: hopeless it is hopeless\n", - "REAL ANSWER: tell ballet then back\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 3\n", - "QUESTION: miss price\n", - "REAL ANSWER: yes learning\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 4\n", - "QUESTION: mr kessler wake up please\n", - "REAL ANSWER: is here are\n", - "PREDICTED ANSWER: i am so ashamed \n", - "\n", - "row 5\n", - "QUESTION: there were witnesses\n", - "REAL ANSWER: why she out\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 6\n", - "QUESTION: what about it\n", - "REAL ANSWER: not you are\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 7\n", - "QUESTION: go on ask them\n", - "REAL ANSWER: i just home\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 8\n", - "QUESTION: beware the moon\n", - "REAL ANSWER: seen hi is he\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 9\n", - "QUESTION: did you hear that\n", - "REAL ANSWER: is down what\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 10\n", - "QUESTION: i heard that\n", - "REAL ANSWER: it here not\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 11\n", - "QUESTION: the hound of the baskervilles\n", - "REAL ANSWER: heard\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 12\n", - "QUESTION: it is moving\n", - "REAL ANSWER: not you hear\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 13\n", - "QUESTION: nice doggie good boy\n", - "REAL ANSWER: bill stupid\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 14\n", - "QUESTION: it sounds far away\n", - "REAL ANSWER: that pecos baby seen hi\n", - "PREDICTED ANSWER: i am fucked \n", - "\n", - "row 15\n", - "QUESTION: debbie klein cried a lot\n", - "REAL ANSWER: is will srai not\n", - "PREDICTED ANSWER: i am so ashamed \n", - "\n", - "row 16\n", - "QUESTION: what are you doing here\n", - "REAL ANSWER: is know look i\n", - "PREDICTED ANSWER: i am fucked \n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic.ipynb b/neural-machine-translation/37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic.ipynb new file mode 100644 index 0000000..a229c46 --- /dev/null +++ b/neural-machine-translation/37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic.ipynb @@ -0,0 +1,783 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongMonotonicAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " encoder_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[13764, 9943, 9943, 9943, 9943, 71, 71, 71, 31683,\n", + " 31683, 31683, 31683, 20783, 20783, 2946, 30402, 30402, 30402,\n", + " 30402, 4911, 4911, 4911, 4911, 4911, 23574, 23574, 23574,\n", + " 26997, 26997, 26997, 26997, 21008, 21008, 21008, 57, 57,\n", + " 57, 5543, 5543, 5543, 5543, 5543, 31886, 31886, 19864,\n", + " 19864, 25798, 25798, 25798, 26882, 26882, 21539, 21539, 21539,\n", + " 21539, 7265, 7265, 14401, 14401, 22604, 22604, 22604, 22604,\n", + " 22604, 31174, 31174, 31174, 31174, 31332, 31332, 25614, 25614],\n", + " [26180, 26180, 26180, 8869, 8869, 23088, 23088, 23088, 24766,\n", + " 24766, 24766, 24766, 24766, 24766, 8140, 8140, 1537, 1537,\n", + " 6285, 6285, 412, 6285, 6285, 6285, 6285, 6285, 27420,\n", + " 6285, 22508, 22508, 22508, 31892, 31892, 15068, 31892, 31892,\n", + " 31892, 31892, 16849, 16849, 2215, 2215, 2215, 20435, 20435,\n", + " 20435, 25981, 25981, 25981, 25981, 25981, 28458, 28458, 28458,\n", + " 20028, 20028, 20028, 18530, 18530, 18530, 18530, 18530, 14979,\n", + " 6706, 6706, 6576, 6576, 18530, 18530, 18530, 7059, 9706],\n", + " [29349, 29349, 29349, 17487, 17487, 11336, 11336, 11336, 13102,\n", + " 13102, 13102, 13102, 11773, 23295, 14379, 14379, 14379, 3413,\n", + " 3413, 23454, 8322, 12988, 12988, 24230, 24230, 24230, 26781,\n", + " 26781, 26781, 26781, 30630, 30630, 30630, 30630, 24549, 24549,\n", + " 24549, 24549, 24549, 6738, 6738, 6738, 6738, 6738, 6738,\n", + " 6738, 6738, 6738, 15861, 29311, 29311, 29311, 29311, 29311,\n", + " 11138, 23344, 23344, 23344, 23344, 18168, 18168, 26465, 26465,\n", + " 26465, 26465, 26465, 26465, 14934, 13228, 13228, 1212, 1212],\n", + " [11295, 11295, 6155, 6155, 6155, 23357, 7313, 7313, 7313,\n", + " 7313, 7313, 3558, 3558, 11456, 11456, 11456, 5703, 5703,\n", + " 5703, 5703, 5703, 12740, 17465, 2786, 2786, 2786, 30724,\n", + " 28170, 28170, 6997, 18855, 18855, 18855, 18855, 18855, 11598,\n", + " 11598, 11598, 4196, 11598, 4196, 4196, 15285, 15285, 22764,\n", + " 22764, 30013, 30013, 30013, 30013, 13052, 15276, 23860, 23860,\n", + " 23860, 23860, 23860, 23860, 31283, 31283, 31283, 31283, 5790,\n", + " 5790, 5790, 15935, 15935, 17368, 693, 693, 17368, 17368],\n", + " [ 3359, 9710, 13616, 13616, 13616, 8382, 8382, 8382, 4783,\n", + " 4783, 4783, 853, 853, 853, 26289, 26289, 26289, 21892,\n", + " 17942, 17942, 17942, 17942, 3689, 3689, 3689, 3748, 3748,\n", + " 22205, 9216, 9216, 9216, 5663, 5663, 5663, 5663, 5663,\n", + " 153, 153, 153, 153, 153, 153, 6291, 6291, 6291,\n", + " 6291, 645, 5954, 5954, 5954, 15677, 15677, 15677, 16948,\n", + " 27526, 30687, 30687, 30687, 21006, 21006, 24316, 24316, 24316,\n", + " 6680, 6680, 3438, 3621, 3621, 3621, 3621, 28702, 12922],\n", + " [21632, 21632, 21632, 21632, 30271, 23491, 23491, 23491, 23491,\n", + " 247, 247, 247, 12061, 11626, 11626, 10005, 10005, 8678,\n", + " 14534, 14534, 14534, 14534, 3463, 3463, 3463, 3463, 20030,\n", + " 20030, 20030, 20030, 20030, 20030, 22995, 22995, 22995, 13857,\n", + " 2363, 2363, 2363, 2363, 2363, 7874, 7874, 7874, 16141,\n", + " 16141, 16141, 16141, 13555, 13555, 7918, 7918, 25749, 13263,\n", + " 13263, 11193, 11193, 11193, 11193, 11193, 11193, 31225, 31225,\n", + " 31225, 31225, 11393, 11393, 26707, 26707, 8955, 8955, 7208],\n", + " [12463, 12463, 12463, 12463, 12463, 12463, 12463, 12463, 12463,\n", + " 12463, 12463, 12463, 12463, 9934, 9934, 19975, 9934, 19975,\n", + " 14220, 14220, 14220, 18331, 18331, 28019, 28019, 28997, 28997,\n", + " 9456, 9456, 908, 14114, 18147, 18147, 18147, 18147, 18179,\n", + " 25612, 22909, 22909, 22909, 22909, 22909, 6261, 6261, 6261,\n", + " 6261, 6261, 26173, 26173, 26173, 28021, 28021, 28021, 28021,\n", + " 28021, 28021, 13325, 13325, 13325, 2006, 14431, 14431, 14431,\n", + " 14431, 14431, 14431, 11990, 11990, 11990, 11990, 28866, 20084],\n", + " [ 4224, 4224, 15016, 15016, 15016, 15016, 16455, 11409, 11409,\n", + " 11262, 11262, 11262, 10154, 10154, 10154, 10154, 7035, 7035,\n", + " 7035, 7035, 7035, 20978, 20978, 20978, 20978, 20978, 14780,\n", + " 14780, 14780, 14780, 25484, 25484, 25484, 25484, 25484, 21047,\n", + " 21047, 21047, 22903, 22903, 22903, 22903, 24071, 17022, 17022,\n", + " 5774, 5774, 17022, 27819, 27819, 20714, 20714, 20714, 20714,\n", + " 20714, 10058, 10058, 10058, 10058, 10058, 9219, 9219, 9219,\n", + " 9219, 2072, 2072, 2072, 2072, 2072, 2072, 2072, 22870],\n", + " [ 6353, 4757, 4757, 4757, 12064, 12064, 12064, 12064, 12064,\n", + " 12064, 31126, 31126, 31126, 18660, 18660, 18660, 18660, 18660,\n", + " 22430, 22430, 17518, 19106, 19106, 19106, 19106, 19106, 7162,\n", + " 7162, 7162, 7162, 7162, 18738, 18738, 10935, 16328, 16328,\n", + " 16328, 16328, 16328, 15687, 9039, 9039, 16731, 9039, 7042,\n", + " 7042, 14348, 14348, 14348, 14348, 8801, 8801, 8801, 12610,\n", + " 12610, 8468, 5511, 5511, 5511, 7917, 7917, 3460, 18155,\n", + " 18155, 13228, 13228, 24543, 24543, 926, 926, 926, 926],\n", + " [20565, 20565, 20565, 20565, 20565, 12656, 12656, 12656, 12656,\n", + " 18279, 18279, 18279, 18279, 18279, 18279, 16750, 30923, 30923,\n", + " 30923, 30923, 30923, 30923, 30923, 15158, 15158, 1706, 1706,\n", + " 1706, 1706, 1706, 1706, 7833, 7833, 24907, 24907, 24907,\n", + " 24907, 29636, 7235, 7235, 7235, 18554, 20203, 20203, 20203,\n", + " 20203, 20203, 14094, 14094, 14094, 14094, 12643, 12643, 12643,\n", + " 2905, 2905, 2905, 2905, 18593, 28550, 16025, 23197, 23197,\n", + " 23197, 23197, 1991, 18930, 18930, 18930, 18930, 18930, 18930]],\n", + " dtype=int32), 10.374412, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [14:24<00:00, 1.81it/s, accuracy=0.243, cost=4.69]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.80it/s, accuracy=0.285, cost=4.09]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.06368613" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/38.capsule-lstm-seq2seq-luong-beam.ipynb b/neural-machine-translation/38.capsule-lstm-seq2seq-luong-beam.ipynb deleted file mode 100644 index ae0a910..0000000 --- a/neural-machine-translation/38.capsule-lstm-seq2seq-luong-beam.ipynb +++ /dev/null @@ -1,504 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lines = open('movie_lines.txt', encoding='utf-8', errors='ignore').read().split('\\n')\n", - "conv_lines = open('movie_conversations.txt', encoding='utf-8', errors='ignore').read().split('\\n')\n", - "\n", - "id2line = {}\n", - "for line in lines:\n", - " _line = line.split(' +++$+++ ')\n", - " if len(_line) == 5:\n", - " id2line[_line[0]] = _line[4]\n", - " \n", - "convs = [ ]\n", - "for line in conv_lines[:-1]:\n", - " _line = line.split(' +++$+++ ')[-1][1:-1].replace(\"'\",\"\").replace(\" \",\"\")\n", - " convs.append(_line.split(','))\n", - " \n", - "questions = []\n", - "answers = []\n", - "\n", - "for conv in convs:\n", - " for i in range(len(conv)-1):\n", - " questions.append(id2line[conv[i]])\n", - " answers.append(id2line[conv[i+1]])\n", - " \n", - "def clean_text(text):\n", - " text = text.lower()\n", - " text = re.sub(r\"i'm\", \"i am\", text)\n", - " text = re.sub(r\"he's\", \"he is\", text)\n", - " text = re.sub(r\"she's\", \"she is\", text)\n", - " text = re.sub(r\"it's\", \"it is\", text)\n", - " text = re.sub(r\"that's\", \"that is\", text)\n", - " text = re.sub(r\"what's\", \"that is\", text)\n", - " text = re.sub(r\"where's\", \"where is\", text)\n", - " text = re.sub(r\"how's\", \"how is\", text)\n", - " text = re.sub(r\"\\'ll\", \" will\", text)\n", - " text = re.sub(r\"\\'ve\", \" have\", text)\n", - " text = re.sub(r\"\\'re\", \" are\", text)\n", - " text = re.sub(r\"\\'d\", \" would\", text)\n", - " text = re.sub(r\"\\'re\", \" are\", text)\n", - " text = re.sub(r\"won't\", \"will not\", text)\n", - " text = re.sub(r\"can't\", \"cannot\", text)\n", - " text = re.sub(r\"n't\", \" not\", text)\n", - " text = re.sub(r\"n'\", \"ng\", text)\n", - " text = re.sub(r\"'bout\", \"about\", text)\n", - " text = re.sub(r\"'til\", \"until\", text)\n", - " text = re.sub(r\"[-()\\\"#/@;:<>{}`+=~|.!?,]\", \"\", text)\n", - " return ' '.join([i.strip() for i in filter(None, text.split())])\n", - "\n", - "clean_questions = []\n", - "for question in questions:\n", - " clean_questions.append(clean_text(question))\n", - " \n", - "clean_answers = [] \n", - "for answer in answers:\n", - " clean_answers.append(clean_text(answer))\n", - " \n", - "min_line_length = 2\n", - "max_line_length = 5\n", - "short_questions_temp = []\n", - "short_answers_temp = []\n", - "\n", - "i = 0\n", - "for question in clean_questions:\n", - " if len(question.split()) >= min_line_length and len(question.split()) <= max_line_length:\n", - " short_questions_temp.append(question)\n", - " short_answers_temp.append(clean_answers[i])\n", - " i += 1\n", - "\n", - "short_questions = []\n", - "short_answers = []\n", - "\n", - "i = 0\n", - "for answer in short_answers_temp:\n", - " if len(answer.split()) >= min_line_length and len(answer.split()) <= max_line_length:\n", - " short_answers.append(answer)\n", - " short_questions.append(short_questions_temp[i])\n", - " i += 1\n", - "\n", - "question_test = short_questions[500:550]\n", - "answer_test = short_answers[500:550]\n", - "short_questions = short_questions[:500]\n", - "short_answers = short_answers[:500]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_from = ' '.join(short_questions+question_test).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])\n", - "print('filtered vocab size:',len(dictionary_from))\n", - "print(\"% of vocab used: {}%\".format(round(len(dictionary_from)/vocabulary_size_from,4)*100))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "concat_to = ' '.join(short_answers+answer_test).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab from size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])\n", - "print('filtered vocab size:',len(dictionary_to))\n", - "print(\"% of vocab used: {}%\".format(round(len(dictionary_to)/vocabulary_size_to,4)*100))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(short_answers)):\n", - " short_answers[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def squash(X, epsilon = 1e-9):\n", - " vec_squared_norm = tf.reduce_sum(tf.square(X), -2, keep_dims=True)\n", - " scalar_factor = vec_squared_norm / (1 + vec_squared_norm) / tf.sqrt(vec_squared_norm + epsilon)\n", - " return scalar_factor * X\n", - "\n", - "def conv_layer(X, num_output, num_vector, kernel=None, stride=None):\n", - " capsules = tf.layers.conv1d(X, num_output * num_vector,\n", - " kernel, stride, padding=\"VALID\", activation=tf.nn.relu)\n", - " capsules = tf.reshape(capsules, (tf.shape(X)[0], -1, num_vector, 1))\n", - " return squash(capsules)\n", - "\n", - "def routing(X, b_IJ, seq_len, dimension_out, routing_times = 2):\n", - " shape_X = tf.shape(X)[1]\n", - " w = tf.Variable(tf.truncated_normal([1, 1, seq_len, 4, dimension_out//2], stddev=1e-1))\n", - " X = tf.tile(X, [1, 1, seq_len, 1, dimension_out])\n", - " w = tf.tile(w, [tf.shape(X)[0], tf.shape(X)[1], 1, 1, routing_times])\n", - " print('X shape: %s, w shape: %s'%(str(X.shape), str(w.shape)))\n", - " u_hat = tf.matmul(w, X, transpose_a=True)\n", - " u_hat_stopped = tf.stop_gradient(u_hat)\n", - " for i in range(routing_times):\n", - " c_IJ = tf.nn.softmax(b_IJ, dim=2)\n", - " print(c_IJ)\n", - " if i == routing_times - 1:\n", - " s_J = tf.multiply(c_IJ, u_hat)\n", - " s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)\n", - " v_J = squash(s_J)\n", - " else:\n", - " s_J = tf.multiply(c_IJ, u_hat_stopped)\n", - " s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)\n", - " v_J = squash(s_J)\n", - " v_J_tiled = tf.tile(v_J, [1, shape_X, 1, 1, 1])\n", - " u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)\n", - " b_IJ += u_produce_v\n", - " return v_J\n", - "\n", - "def fully_conn_layer(X, num_output, dimension_out):\n", - " batch_size = tf.shape(X)[1]\n", - " X_ = tf.reshape(X, shape=(tf.shape(X)[0], -1, 1, X.shape[-2].value, 1))\n", - " b_IJ = tf.fill([tf.shape(X)[0], tf.shape(X)[1], num_output, 1, 1], 0.0)\n", - " capsules = routing(X_, b_IJ, num_output, dimension_out, routing_times = 2)\n", - " capsules = tf.squeeze(capsules, axis=1)\n", - " return capsules\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, seq_len, maxlen,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size,\n", - " kernels=[2, 4, 4],strides=[3,2,1],epsilon=1e-8,\n", - " force_teaching_ratio=0.5,beam_width = 5):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, maxlen])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, self.X)\n", - " \n", - " results = []\n", - " for i in range(len(kernels)):\n", - " conv = tf.layers.conv1d(encoder_embedded, filters=32,\n", - " kernel_size=kernels[i], strides=strides[i],\n", - " padding='VALID')\n", - " caps1 = conv_layer(conv, 4, 4, kernels[i], strides[i])\n", - " caps2 = fully_conn_layer(caps1,seq_len,embedded_size)\n", - " v_length = tf.sqrt(tf.reduce_sum(tf.square(caps2),axis=2, keep_dims=True) + epsilon)[:,:,0,:]\n", - " print('output shape: %s\\n'%(str(v_length.shape)))\n", - " results.append(v_length)\n", - " results = tf.concat(results,1)\n", - " self.X_seq_len = tf.fill([batch_size], seq_len * len(kernels))\n", - " \n", - " self.encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " inputs = results,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embedding,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cell,\n", - " embedding = decoder_embedding,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(cell_state = encoder_state_tiled),\n", - " beam_width = beam_width,\n", - " output_layer = dense,\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 1e-2\n", - "batch_size = 16\n", - "epoch = 20\n", - "maxlen = 10" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, 5, maxlen, len(dictionary_from), \n", - " len(dictionary_to), learning_rate, batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(short_questions, dictionary_from)\n", - "Y = str_idx(short_answers, dictionary_to)\n", - "X_test = str_idx(question_test, dictionary_from)\n", - "Y_test = str_idx(answer_test, dictionary_from)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def pad_sentence_batch_static(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(short_questions), batch_size):\n", - " index = min(k + batch_size, len(short_questions))\n", - " batch_x, _ = pad_sentence_batch_static(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD)\n", - " predicted, loss, _, accuracy = sess.run([model.predicting_ids, model.cost, \n", - " model.optimizer, model.accuracy], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(short_questions) / batch_size)\n", - " total_accuracy /= (len(short_questions) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "batch_x, _ = pad_sentence_batch_static(X_test[:batch_size], PAD)\n", - "batch_y, seq_y = pad_sentence_batch(Y_test[:batch_size], PAD)\n", - "predicted = sess.run(model.predicting_ids, feed_dict={model.X:batch_x})\n", - "\n", - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/38.gru-birnn-seq2seq-contrib-beam-luongmonotic.ipynb b/neural-machine-translation/38.gru-birnn-seq2seq-contrib-beam-luongmonotic.ipynb new file mode 100644 index 0000000..fd7f534 --- /dev/null +++ b/neural-machine-translation/38.gru-birnn-seq2seq-contrib-beam-luongmonotic.ipynb @@ -0,0 +1,757 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongMonotonicAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 4806, 4806, 7844, 17749, 7844, 7844, 7844, 7844, 8016,\n", + " 16931, 16931, 16931, 10817, 28564, 28564, 28564, 28564, 28564,\n", + " 28564, 28564, 15376, 15376, 15376, 15376, 12176, 12176, 12176,\n", + " 12176, 12176, 21343, 21343, 21343, 21343, 21343, 21343, 16575,\n", + " 16575, 16575, 16575, 16575, 16575, 16575, 16575, 12224, 31957,\n", + " 16715, 16715, 20056, 20056, 20056, 20056, 20056, 20056, 20056,\n", + " 21818, 21818, 8402, 19629, 19629, 19629, 19629, 19629, 13732,\n", + " 19629, 900, 19629, 900, 19629, 900, 18836, 18836, 18836],\n", + " [17250, 17250, 9385, 9385, 9385, 9385, 529, 22489, 22489,\n", + " 22489, 22489, 22489, 22489, 22489, 29215, 29215, 333, 333,\n", + " 333, 333, 10683, 10683, 10683, 10683, 18647, 18647, 18647,\n", + " 18647, 18513, 18513, 18513, 18513, 18513, 18513, 18513, 3031,\n", + " 3031, 3031, 3031, 8819, 8819, 8819, 19052, 27120, 27120,\n", + " 27120, 27120, 27120, 19378, 20033, 20033, 20033, 20033, 20033,\n", + " 20033, 7215, 21854, 7215, 21854, 19466, 19466, 28978, 28978,\n", + " 28978, 23529, 8934, 8934, 8934, 21479, 21479, 21479, 21479],\n", + " [ 4953, 4953, 4953, 539, 539, 19389, 19389, 19389, 19389,\n", + " 19389, 19389, 20751, 26159, 26159, 26159, 26159, 30993, 26159,\n", + " 30993, 30993, 26159, 24110, 24110, 24110, 24110, 16273, 16273,\n", + " 16273, 24921, 24921, 27902, 27902, 5450, 5450, 24918, 24918,\n", + " 24918, 24918, 12684, 5245, 5245, 5245, 5245, 21501, 21501,\n", + " 21501, 30324, 30324, 30324, 4791, 23984, 23984, 23984, 23984,\n", + " 23984, 23984, 20529, 17426, 20147, 20147, 20147, 20147, 20147,\n", + " 20147, 20147, 20147, 20147, 24958, 24958, 24958, 24958, 4406],\n", + " [ 4847, 27272, 27272, 27272, 31588, 31588, 31588, 31588, 31588,\n", + " 31588, 31588, 19436, 19436, 24607, 24607, 24607, 3090, 3090,\n", + " 30108, 3090, 3090, 3090, 16464, 16464, 16464, 16464, 27971,\n", + " 27971, 27971, 27971, 31822, 31822, 31822, 24278, 9134, 9134,\n", + " 9134, 9134, 13251, 12523, 12523, 12523, 12523, 12523, 6067,\n", + " 6067, 6067, 6067, 2386, 25595, 25595, 25595, 24587, 24587,\n", + " 11974, 15848, 15848, 15848, 15848, 15848, 15848, 4196, 4196,\n", + " 4196, 15562, 6785, 6785, 6785, 6785, 6785, 19757, 25325],\n", + " [29951, 29951, 10151, 10151, 23033, 23033, 23033, 23033, 23033,\n", + " 23033, 23033, 23033, 23033, 23033, 23033, 23033, 23033, 23033,\n", + " 23033, 23033, 23033, 23033, 23033, 22353, 23033, 12606, 23033,\n", + " 12606, 23033, 9314, 9314, 9314, 8512, 8512, 8512, 8512,\n", + " 18743, 18743, 18743, 12910, 12910, 10512, 10512, 10512, 10512,\n", + " 10512, 10512, 10512, 10512, 10512, 10512, 10512, 10512, 6653,\n", + " 6653, 6653, 6653, 6653, 6653, 6653, 6081, 6081, 3562,\n", + " 2542, 2542, 2542, 2542, 2542, 2542, 2542, 11756, 11756],\n", + " [14603, 11412, 11412, 11412, 11412, 24982, 24982, 24982, 25475,\n", + " 25475, 25475, 15962, 15962, 15962, 1245, 1245, 100, 100,\n", + " 100, 100, 1030, 1030, 1030, 1030, 1030, 1030, 1030,\n", + " 23691, 23691, 23691, 23691, 23691, 23691, 13918, 17390, 17390,\n", + " 17390, 17390, 20359, 20359, 20359, 20359, 20359, 20359, 20359,\n", + " 15511, 12559, 12559, 12559, 12559, 12559, 12559, 30304, 30304,\n", + " 30304, 30304, 2980, 8720, 8720, 8720, 11791, 8720, 11791,\n", + " 11791, 24460, 24460, 25546, 25546, 23924, 23924, 23924, 25503],\n", + " [24002, 24002, 24002, 30458, 7234, 7234, 7234, 24532, 24532,\n", + " 31970, 24359, 24359, 24359, 24359, 24359, 22397, 25215, 25215,\n", + " 20956, 20956, 31304, 31304, 31304, 17272, 16791, 16791, 16791,\n", + " 3700, 3700, 3700, 25408, 25408, 25408, 29985, 29985, 29985,\n", + " 27554, 27554, 27554, 27554, 27554, 27554, 27554, 27554, 17856,\n", + " 17856, 17856, 8448, 8448, 17942, 17942, 17942, 11394, 6256,\n", + " 6256, 6256, 6256, 28357, 6596, 6596, 6596, 6596, 15738,\n", + " 2777, 31224, 31224, 31224, 30284, 30284, 30284, 30284, 2272],\n", + " [20269, 5486, 5486, 5486, 5486, 5486, 5486, 5486, 5486,\n", + " 24416, 3316, 19120, 19120, 19120, 19120, 31386, 31386, 31386,\n", + " 31386, 31386, 31386, 31386, 31386, 17628, 28401, 28401, 28401,\n", + " 28401, 28401, 28401, 28401, 31876, 31876, 31876, 31876, 31876,\n", + " 31876, 31876, 31876, 20924, 20924, 20924, 20924, 24799, 24799,\n", + " 12042, 12042, 12042, 21228, 24799, 21228, 21228, 21228, 28927,\n", + " 28927, 28927, 28927, 28927, 13477, 13477, 30888, 30888, 16526,\n", + " 27831, 21674, 21674, 21674, 13416, 13416, 13416, 3120, 3120],\n", + " [ 1066, 7228, 7228, 7228, 7228, 529, 529, 529, 529,\n", + " 529, 529, 4640, 4640, 27396, 27396, 27396, 27396, 27396,\n", + " 529, 12273, 12273, 8278, 28486, 12092, 12092, 12092, 12092,\n", + " 12092, 12092, 12092, 12092, 12092, 30640, 30640, 30640, 30640,\n", + " 13809, 13809, 23164, 23164, 23164, 23164, 23164, 23598, 23598,\n", + " 23598, 23598, 23541, 23541, 23541, 23541, 2376, 18999, 18999,\n", + " 3412, 3412, 3412, 3412, 3412, 4030, 2529, 4030, 9867,\n", + " 9867, 9867, 9867, 9867, 9867, 840, 4519, 29594, 29594],\n", + " [ 2453, 2453, 8535, 8535, 8535, 8535, 8535, 21293, 21293,\n", + " 21293, 21293, 18657, 18657, 18657, 18657, 18657, 4785, 3685,\n", + " 3685, 13051, 29321, 29321, 29321, 29321, 29321, 20887, 20887,\n", + " 20887, 8626, 8626, 8626, 8626, 8626, 28782, 1896, 1896,\n", + " 1896, 1896, 1896, 1896, 4763, 4763, 4763, 15011, 1528,\n", + " 27520, 27520, 27520, 27520, 1528, 27520, 27520, 5709, 5709,\n", + " 5709, 5709, 29886, 17387, 8903, 26464, 8903, 8903, 8903,\n", + " 23782, 27512, 16611, 16611, 22752, 22752, 22752, 20702, 20702]],\n", + " dtype=int32), 10.376848, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [14:50<00:00, 1.75it/s, accuracy=0.253, cost=4.67]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.76it/s, accuracy=0.317, cost=3.97]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.06407658" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic.ipynb b/neural-machine-translation/39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic.ipynb new file mode 100644 index 0000000..fd079de --- /dev/null +++ b/neural-machine-translation/39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic.ipynb @@ -0,0 +1,674 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauMonotonicAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " encoder_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[20372, 20372, 15834, 14405, 14405, 14405, 14405, 10244, 10244,\n", + " 10244, 10244, 10244, 24771, 24771, 10096, 10096, 10096, 10096,\n", + " 24888, 24888, 24888, 16804, 16804, 16804, 16804, 16804, 16917,\n", + " 16917, 16917, 16917, 19714, 19714, 19714, 28672, 13731, 13731,\n", + " 5701, 5701, 5701, 5701, 5701, 860, 860, 860, 860,\n", + " 4793, 4793, 10030, 1070, 12616, 12616, 12616, 13252, 10138,\n", + " 10138, 10138, 10138, 10138, 7635, 17461, 17461, 17461, 17461,\n", + " 17461, 17461, 17461, 31189, 31189, 31189, 2948, 4772, 4772],\n", + " [ 7216, 27528, 24472, 24472, 24472, 24472, 28258, 25162, 25162,\n", + " 25162, 25162, 14137, 14137, 14137, 14137, 14137, 15247, 15247,\n", + " 8013, 8013, 8013, 7063, 10535, 12755, 12755, 12755, 28844,\n", + " 24636, 24636, 24636, 21346, 21346, 31382, 31382, 31382, 31382,\n", + " 26176, 26600, 26600, 26600, 26600, 26600, 23550, 23550, 25697,\n", + " 25697, 25697, 25697, 9554, 9554, 9554, 9554, 9554, 1896,\n", + " 28503, 28503, 1896, 28503, 28503, 28503, 7364, 31797, 31797,\n", + " 31797, 31797, 31797, 14821, 14821, 14821, 18362, 6788, 6788],\n", + " [10991, 10991, 17967, 17967, 17967, 17967, 17967, 26016, 26016,\n", + " 26016, 26016, 26016, 1136, 1136, 1136, 10658, 10658, 10658,\n", + " 10658, 20307, 3247, 3247, 3247, 3247, 22276, 22276, 22276,\n", + " 22276, 16309, 16309, 16309, 16309, 16309, 16309, 13211, 13987,\n", + " 13987, 13987, 13987, 13987, 13987, 13987, 18337, 18337, 18337,\n", + " 3868, 3868, 7070, 7070, 7070, 7070, 7070, 7070, 22861,\n", + " 16106, 16106, 16106, 16106, 16106, 16106, 6739, 6739, 6739,\n", + " 6739, 10316, 24067, 24067, 10316, 10316, 19149, 19149, 19149],\n", + " [16189, 23019, 23019, 28517, 3087, 3087, 3087, 3087, 15697,\n", + " 15697, 15697, 14464, 12955, 12955, 12955, 12955, 12955, 14925,\n", + " 14925, 14925, 14925, 14925, 15931, 15931, 12072, 6548, 6548,\n", + " 6548, 6548, 20171, 2552, 2552, 2552, 21729, 21729, 21729,\n", + " 21729, 9259, 9259, 9259, 9259, 11403, 11403, 22190, 22190,\n", + " 4253, 4253, 4253, 4253, 2747, 28256, 28256, 28256, 22408,\n", + " 22408, 22408, 22408, 13579, 13579, 13579, 12908, 12908, 12908,\n", + " 15242, 15242, 18111, 18111, 18111, 18111, 18111, 1402, 1402],\n", + " [ 4141, 4141, 4141, 1786, 1786, 1786, 1786, 9, 9,\n", + " 9, 9, 9, 9, 9, 26987, 31981, 31981, 31981,\n", + " 31981, 31981, 31981, 16283, 16283, 16283, 22246, 22246, 29771,\n", + " 29771, 29771, 29771, 29771, 11799, 11799, 11799, 11799, 11799,\n", + " 11799, 11799, 11799, 11799, 11799, 13302, 13302, 11799, 13302,\n", + " 13302, 13302, 16873, 16873, 16873, 16873, 16873, 16873, 18287,\n", + " 18287, 18287, 18287, 30304, 30304, 30304, 30304, 30304, 1059,\n", + " 1059, 1059, 1059, 1059, 1059, 1059, 1059, 1059, 1059],\n", + " [16982, 1626, 1626, 1626, 1626, 1626, 1626, 1626, 23437,\n", + " 23437, 23437, 23437, 23437, 6178, 6178, 6178, 17968, 11073,\n", + " 11073, 2610, 2610, 2610, 30654, 12390, 12390, 12390, 12390,\n", + " 22206, 22206, 22206, 22206, 6407, 6407, 4451, 31913, 31913,\n", + " 31913, 4345, 4345, 4345, 30645, 2886, 2886, 25940, 25940,\n", + " 25940, 25940, 29557, 1506, 1506, 1506, 1506, 1506, 19394,\n", + " 19394, 19394, 11319, 11319, 11319, 11319, 24386, 24386, 24386,\n", + " 26563, 26563, 22545, 22545, 24801, 6873, 6873, 6873, 11649],\n", + " [ 2735, 25948, 25948, 28556, 28556, 28556, 28556, 28556, 28556,\n", + " 23269, 23269, 23269, 18330, 18330, 18330, 18330, 18330, 21782,\n", + " 21782, 21782, 21782, 1174, 1174, 1174, 11998, 11998, 11998,\n", + " 7060, 7060, 12857, 985, 12857, 9522, 9522, 9522, 9522,\n", + " 20771, 20771, 20771, 20771, 11531, 24183, 24183, 24183, 15783,\n", + " 15783, 12165, 12165, 12165, 12165, 12165, 19811, 19811, 19811,\n", + " 19811, 19811, 25040, 25040, 25040, 25040, 25040, 8392, 8392,\n", + " 15791, 15791, 13570, 13570, 13570, 13570, 13570, 13570, 13570],\n", + " [14876, 14876, 14876, 19586, 4303, 4303, 4303, 4303, 4303,\n", + " 4303, 4303, 4303, 1854, 274, 274, 274, 274, 6251,\n", + " 28656, 28656, 28656, 28656, 28656, 28656, 28656, 16928, 16928,\n", + " 16928, 16928, 16928, 25953, 25953, 30887, 30887, 30887, 30887,\n", + " 30887, 30887, 30887, 30887, 30887, 30887, 30466, 30466, 30466,\n", + " 30466, 30466, 30466, 3433, 3433, 3433, 3433, 12081, 12081,\n", + " 12081, 12081, 11149, 2248, 2248, 2248, 4359, 4359, 4359,\n", + " 4359, 4359, 4359, 28193, 28193, 28193, 28193, 28193, 3297],\n", + " [ 5451, 5451, 5451, 19000, 19000, 19000, 19000, 18801, 2606,\n", + " 2606, 7981, 7981, 7981, 7981, 7981, 908, 908, 908,\n", + " 28816, 28816, 28816, 28816, 28816, 28816, 8319, 8319, 199,\n", + " 199, 199, 199, 17355, 17355, 17355, 17355, 17355, 17355,\n", + " 17355, 1227, 5802, 30287, 30287, 30287, 30287, 30287, 29201,\n", + " 29201, 24617, 24617, 24617, 10626, 10626, 10626, 23183, 23183,\n", + " 23183, 23183, 13665, 13665, 13665, 13665, 13665, 24439, 24439,\n", + " 6512, 6512, 6512, 15533, 15533, 15533, 15533, 11103, 11103],\n", + " [17668, 17668, 17668, 15000, 15000, 15000, 17241, 17241, 17241,\n", + " 17241, 3304, 3304, 3304, 3304, 26052, 26052, 26052, 18177,\n", + " 18177, 1224, 1224, 1224, 26627, 26627, 26627, 26627, 13620,\n", + " 29626, 29626, 29626, 29626, 21084, 10496, 21084, 28901, 28901,\n", + " 28901, 31169, 31169, 31169, 31169, 9954, 21852, 13957, 13957,\n", + " 13957, 13957, 27495, 27495, 27495, 5154, 5154, 5154, 5154,\n", + " 5154, 23980, 13425, 13425, 13425, 13425, 13425, 15938, 15938,\n", + " 7135, 20449, 20449, 2820, 2820, 2820, 22323, 14889, 14889]],\n", + " dtype=int32), 10.3737135, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [14:39<00:00, 1.78it/s, accuracy=0.241, cost=4.66]\n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.82it/s, accuracy=0.29, cost=4.01] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.17586066" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/39.lstm-birnn-seq2seq-luong-bahdanau-stack-beam-dropout-l2.ipynb b/neural-machine-translation/39.lstm-birnn-seq2seq-luong-bahdanau-stack-beam-dropout-l2.ipynb deleted file mode 100644 index 3cd0487..0000000 --- a/neural-machine-translation/39.lstm-birnn-seq2seq-luong-bahdanau-stack-beam-dropout-l2.ipynb +++ /dev/null @@ -1,490 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def cells(size, reuse=False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),reuse=reuse),\n", - " input_keep_prob=0.8,\n", - " output_keep_prob=0.8,\n", - " state_keep_prob=0.8)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " def bahdanau(size):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " def luong(size):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size, \n", - " memory = self.encoder_out)\n", - " return tf.contrib.seq2seq.AttentionWrapper(cell = cells(size), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = bahdanau(size_layer//2),\n", - " cell_bw = luong(size_layer//2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state_c = tf.concat((state_fw[0].c, state_bw[0].c), -1)\n", - " bi_state_h = tf.concat((state_fw[0].h, state_bw[0].h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " dense = tf.layers.Dense(to_dict_size)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = decoder_cells.zero_state(batch_size, tf.float32),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " luong_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " bahdanau_cells = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer,reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([luong_cells, bahdanau_cells])\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cells,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cells.zero_state(batch_size * beam_width, tf.float32),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " l2 = sum(1e-5 * tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables())\n", - " self.cost += l2\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 512\n", - "num_layers = 2\n", - "embedded_size = 256\n", - "learning_rate = 1e-2\n", - "batch_size = 16\n", - "epoch = 30" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size, learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 9.510825, avg accuracy: 0.022027\n", - "epoch: 2, avg loss: 7.445881, avg accuracy: 0.038613\n", - "epoch: 3, avg loss: 6.863027, avg accuracy: 0.041087\n", - "epoch: 4, avg loss: 6.399978, avg accuracy: 0.042979\n", - "epoch: 5, avg loss: 6.248574, avg accuracy: 0.049162\n", - "epoch: 6, avg loss: 6.391312, avg accuracy: 0.049706\n", - "epoch: 7, avg loss: 6.616857, avg accuracy: 0.045438\n", - "epoch: 8, avg loss: 6.609593, avg accuracy: 0.045675\n", - "epoch: 9, avg loss: 6.377094, avg accuracy: 0.051762\n", - "epoch: 10, avg loss: 6.144921, avg accuracy: 0.053860\n", - "epoch: 11, avg loss: 5.762990, avg accuracy: 0.056989\n", - "epoch: 12, avg loss: 5.557138, avg accuracy: 0.054019\n", - "epoch: 13, avg loss: 5.407917, avg accuracy: 0.055601\n", - "epoch: 14, avg loss: 5.249044, avg accuracy: 0.063694\n", - "epoch: 15, avg loss: 5.209891, avg accuracy: 0.062668\n", - "epoch: 16, avg loss: 5.200728, avg accuracy: 0.059336\n", - "epoch: 17, avg loss: 5.219026, avg accuracy: 0.061041\n", - "epoch: 18, avg loss: 5.233603, avg accuracy: 0.061750\n", - "epoch: 19, avg loss: 5.289140, avg accuracy: 0.064320\n", - "epoch: 20, avg loss: 5.292430, avg accuracy: 0.063309\n", - "epoch: 21, avg loss: 5.330809, avg accuracy: 0.067276\n", - "epoch: 22, avg loss: 5.271437, avg accuracy: 0.064013\n", - "epoch: 23, avg loss: 5.333670, avg accuracy: 0.063765\n", - "epoch: 24, avg loss: 5.283467, avg accuracy: 0.065662\n", - "epoch: 25, avg loss: 5.218686, avg accuracy: 0.064164\n", - "epoch: 26, avg loss: 5.265750, avg accuracy: 0.068734\n", - "epoch: 27, avg loss: 5.202237, avg accuracy: 0.066134\n", - "epoch: 28, avg loss: 5.127524, avg accuracy: 0.069533\n", - "epoch: 29, avg loss: 5.069793, avg accuracy: 0.069069\n", - "epoch: 30, avg loss: 5.056033, avg accuracy: 0.066305\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: bạn \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: nếu ? 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 , 2 2 2 2 ? 2 2 ? ? ? ? 2 2 2 ? 2 ? 2 ? ? ? ? ? ? 2 2 ? ? ? ? ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời thời gian gian gian gian gian gian gian gian gian gian gian gian gian tục gian gian gian gian gian gian gian gian thời gian gian gian gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/4.basic-seq2seq-api-greedy.ipynb b/neural-machine-translation/4.basic-seq2seq-api-greedy.ipynb deleted file mode 100644 index b5eddf1..0000000 --- a/neural-machine-translation/4.basic-seq2seq-api-greedy.ipynb +++ /dev/null @@ -1,409 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " _, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " inputs = tf.nn.embedding_lookup(encoder_embedding, self.X),\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embedding,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :6: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.639511, avg accuracy: 0.061922\n", - "epoch: 2, avg loss: 5.879192, avg accuracy: 0.102900\n", - "epoch: 3, avg loss: 5.572026, avg accuracy: 0.130366\n", - "epoch: 4, avg loss: 5.232392, avg accuracy: 0.152376\n", - "epoch: 5, avg loss: 4.865276, avg accuracy: 0.188383\n", - "epoch: 6, avg loss: 4.470282, avg accuracy: 0.234282\n", - "epoch: 7, avg loss: 4.052670, avg accuracy: 0.287898\n", - "epoch: 8, avg loss: 3.640871, avg accuracy: 0.354405\n", - "epoch: 9, avg loss: 3.249600, avg accuracy: 0.421314\n", - "epoch: 10, avg loss: 2.895601, avg accuracy: 0.487875\n", - "epoch: 11, avg loss: 2.601654, avg accuracy: 0.542940\n", - "epoch: 12, avg loss: 2.318833, avg accuracy: 0.599123\n", - "epoch: 13, avg loss: 2.023742, avg accuracy: 0.663397\n", - "epoch: 14, avg loss: 1.755580, avg accuracy: 0.723884\n", - "epoch: 15, avg loss: 1.522422, avg accuracy: 0.777484\n", - "epoch: 16, avg loss: 1.317238, avg accuracy: 0.815015\n", - "epoch: 17, avg loss: 1.125093, avg accuracy: 0.853098\n", - "epoch: 18, avg loss: 0.933701, avg accuracy: 0.893458\n", - "epoch: 19, avg loss: 0.757271, avg accuracy: 0.931947\n", - "epoch: 20, avg loss: 0.605072, avg accuracy: 0.960998\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/4.basic-seq2seq-contrib-greedy.ipynb b/neural-machine-translation/4.basic-seq2seq-contrib-greedy.ipynb new file mode 100644 index 0000000..20d2712 --- /dev/null +++ b/neural-machine-translation/4.basic-seq2seq-contrib-greedy.ipynb @@ -0,0 +1,765 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " _, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 8092, 364, 23527, 16731, 12432, 24937, 21081, 21142, 14804,\n", + " 18234, 25024, 14908, 27933, 23633, 8088, 26961, 10696, 8350,\n", + " 2336, 1560, 9475, 7028, 1952, 9737, 27888, 27603, 21554,\n", + " 5376, 30761, 24453, 14154, 25407, 13988, 11466, 27134, 17576,\n", + " 10293, 435, 28450, 28138, 31434, 2669, 9231, 21043, 29167,\n", + " 3865, 12123, 26151, 22312, 20040, 16020, 7213, 14383, 6306,\n", + " 17745, 20872, 21499, 20713, 27365, 4323, 30281, 647, 23627,\n", + " 10903, 12831, 24343, 869, 27604, 15795, 15732, 28700, 7564],\n", + " [13267, 25553, 398, 27889, 9162, 18628, 28777, 7077, 25680,\n", + " 10341, 31146, 16193, 15781, 5684, 20317, 1807, 3450, 9648,\n", + " 19283, 18304, 14013, 11349, 8453, 3195, 463, 20584, 8176,\n", + " 19834, 13095, 24631, 26027, 5502, 2789, 13208, 19939, 28314,\n", + " 7387, 16891, 23451, 19947, 17939, 7440, 4705, 5147, 29115,\n", + " 9981, 21912, 9427, 19677, 30367, 11677, 18783, 29427, 22706,\n", + " 27585, 26323, 22117, 12203, 12237, 17905, 25664, 28479, 23885,\n", + " 8847, 29361, 13901, 25015, 30020, 6621, 18262, 20044, 793],\n", + " [25888, 13104, 19073, 13882, 25347, 28031, 16980, 13663, 30630,\n", + " 21524, 22420, 20476, 19504, 3087, 18376, 20286, 19796, 21386,\n", + " 11998, 14303, 11230, 31254, 9698, 2441, 18495, 563, 20120,\n", + " 4043, 26373, 2561, 13558, 21822, 11268, 20936, 17852, 30478,\n", + " 15108, 23839, 17263, 1781, 10493, 27366, 28273, 19574, 22308,\n", + " 12773, 15046, 21778, 5379, 17279, 17848, 16524, 3791, 9124,\n", + " 1470, 11228, 20237, 10543, 1773, 30716, 4949, 23964, 30918,\n", + " 1606, 15265, 2073, 1912, 2732, 3298, 6347, 4863, 16785],\n", + " [ 3953, 19172, 7860, 7226, 15206, 31506, 13581, 12356, 2424,\n", + " 29176, 28383, 1964, 15063, 3373, 30104, 22827, 26969, 31149,\n", + " 8929, 21050, 17803, 26325, 20349, 170, 21349, 9271, 6938,\n", + " 1115, 28638, 27742, 6784, 22036, 24586, 25323, 1397, 1649,\n", + " 21863, 2279, 22802, 4233, 4134, 29405, 4721, 17180, 11102,\n", + " 31819, 28859, 476, 23177, 29408, 24049, 5026, 13457, 2267,\n", + " 23273, 18454, 14555, 19882, 15548, 19311, 27900, 31037, 24371,\n", + " 12639, 4716, 23128, 18700, 13460, 1223, 17807, 14073, 6324],\n", + " [28073, 14887, 8094, 29888, 30871, 22065, 3576, 10840, 15791,\n", + " 25776, 18585, 12696, 3850, 24351, 2267, 2993, 13708, 8596,\n", + " 22762, 12654, 7751, 27027, 29957, 26241, 20083, 4850, 10905,\n", + " 15395, 17023, 26495, 19274, 15869, 19036, 27350, 14358, 21701,\n", + " 12954, 24876, 8412, 19410, 18644, 4436, 8881, 28932, 28105,\n", + " 9048, 20711, 25427, 26394, 19509, 26426, 5764, 12757, 25558,\n", + " 19141, 13606, 31124, 11529, 12995, 6525, 18384, 30323, 21503,\n", + " 4762, 1144, 25314, 25638, 17143, 7943, 22824, 22665, 14542],\n", + " [ 8734, 14774, 15053, 22977, 11008, 14026, 15342, 3554, 854,\n", + " 17472, 23770, 15259, 26244, 11156, 16844, 1175, 26715, 9180,\n", + " 15703, 19322, 30378, 4798, 4249, 16533, 12248, 15761, 3797,\n", + " 23640, 21332, 26114, 24196, 8412, 25555, 8806, 17195, 15741,\n", + " 11501, 4463, 18720, 6523, 30750, 16390, 25409, 19032, 1192,\n", + " 20408, 10301, 31946, 29912, 2931, 9093, 10539, 4648, 31751,\n", + " 24813, 24842, 8402, 3866, 28745, 26607, 8885, 26869, 13440,\n", + " 2361, 10348, 23461, 17655, 9538, 24317, 24002, 23711, 19507],\n", + " [20681, 22178, 6759, 26182, 23603, 14513, 30301, 5438, 17831,\n", + " 9621, 25190, 19349, 11788, 24768, 5845, 6541, 27546, 2919,\n", + " 19595, 955, 8535, 12929, 20763, 29832, 21078, 2328, 10863,\n", + " 17892, 5082, 7884, 21420, 22107, 3242, 16307, 28868, 31800,\n", + " 14964, 13342, 7417, 7730, 4597, 31800, 13006, 7866, 4688,\n", + " 6265, 8481, 12363, 9197, 14503, 18132, 17563, 25826, 6762,\n", + " 19442, 15642, 3270, 21740, 28046, 17598, 29722, 5683, 11687,\n", + " 21518, 1491, 29782, 13832, 3291, 11776, 19105, 9432, 24817],\n", + " [ 2281, 30813, 29142, 31103, 17059, 27674, 24746, 2753, 14259,\n", + " 13162, 27555, 20389, 30173, 16573, 6435, 886, 7047, 13766,\n", + " 25416, 23059, 2787, 3705, 26428, 14210, 11678, 30209, 11519,\n", + " 14181, 10191, 3713, 26011, 22138, 28427, 27298, 8008, 6611,\n", + " 4927, 12607, 31287, 26706, 10243, 11705, 2863, 31464, 12841,\n", + " 398, 8985, 6972, 8573, 4230, 20879, 6163, 13199, 19599,\n", + " 11855, 18812, 13303, 2368, 31514, 13648, 28279, 14511, 19608,\n", + " 9503, 11494, 9560, 14941, 31090, 5664, 1005, 6882, 7334],\n", + " [21642, 28651, 4239, 31270, 22920, 2733, 2614, 20510, 26668,\n", + " 596, 2237, 26641, 23547, 27697, 17258, 18297, 7523, 22222,\n", + " 23671, 13238, 8692, 27458, 6950, 6392, 22839, 29692, 6827,\n", + " 1923, 22292, 4563, 638, 24575, 702, 26437, 14252, 10517,\n", + " 2329, 11463, 31996, 26343, 4543, 14744, 4860, 171, 19283,\n", + " 29326, 17165, 14221, 14317, 17032, 21910, 4096, 30839, 14664,\n", + " 13125, 12924, 29338, 22510, 2294, 26486, 14079, 5307, 23237,\n", + " 5674, 12073, 11821, 11683, 10327, 5611, 20650, 27570, 30479],\n", + " [29589, 23008, 21286, 14232, 19157, 3022, 25141, 13909, 31895,\n", + " 30616, 520, 3764, 8229, 9508, 29627, 13615, 16755, 20234,\n", + " 28629, 4910, 10158, 1790, 23695, 21559, 15206, 23256, 3946,\n", + " 14038, 11572, 21260, 10516, 30856, 11683, 27579, 6546, 29923,\n", + " 13803, 25248, 28312, 12703, 20991, 14064, 8449, 19784, 2763,\n", + " 25093, 5278, 17380, 6877, 30872, 25644, 27530, 5011, 4991,\n", + " 24005, 7766, 15518, 20865, 29947, 24200, 28585, 10705, 11637,\n", + " 4306, 24922, 25641, 11231, 19239, 9249, 21678, 29688, 3984]],\n", + " dtype=int32), 10.374851, 0.0]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [06:34<00:00, 3.96it/s, accuracy=0.149, cost=5.87]\n", + "minibatch loop: 100%|██████████| 40/40 [00:05<00:00, 6.99it/s, accuracy=0.161, cost=5.22]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/40.dnc-seq2seq-bahdanau-greedy.ipynb b/neural-machine-translation/40.dnc-seq2seq-bahdanau-greedy.ipynb deleted file mode 100644 index e9d6af6..0000000 --- a/neural-machine-translation/40.dnc-seq2seq-bahdanau-greedy.ipynb +++ /dev/null @@ -1,538 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```bash\n", - "pip install tensorflow-gpu==1.2\n", - "pip install dm-sonnet -U\n", - "pip install tensorflow-probability==0.5.0\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/tf_inspect.py:75: DeprecationWarning: inspect.getargspec() is deprecated since Python 3.0, use inspect.signature() or inspect.getfullargspec()\n", - " return _inspect.getargspec(target)\n" - ] - } - ], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os\n", - "from dnc import DNC" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "num_reads = 5\n", - "num_writes = 1\n", - "memory_size = 128\n", - "word_size = 128\n", - "clip_value = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, batch_size,\n", - " attn_input_feeding=True):\n", - " \n", - " def attn_decoder_input_fn(inputs, attention):\n", - " if attn_input_feeding:\n", - " return inputs\n", - " \n", - " def attention(encoder_out, cell, seq_len, encoder_last_state, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = cell, \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer,\n", - " cell_input_fn=attn_decoder_input_fn,\n", - " initial_cell_state=encoder_last_state,\n", - " alignment_history=False)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " access_config = {\n", - " \"memory_size\": memory_size,\n", - " \"word_size\": word_size,\n", - " \"num_reads\": num_reads,\n", - " \"num_writes\": num_writes,\n", - " }\n", - " controller_config = {\n", - " \"hidden_size\": size_layer,\n", - " }\n", - " self.dnc_cell = DNC(access_config=access_config, controller_config=controller_config,\n", - " output_size=size_layer, clip_value=clip_value)\n", - " self.dnc_initial = self.dnc_cell.initial_state\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " initial_state = self.dnc_initial(batch_size)\n", - " self.encoder_out, self.encoder_state = tf.nn.dynamic_rnn(\n", - " cell=self.dnc_cell, inputs=encoder_embedded,\n", - " sequence_length=self.X_seq_len, dtype=tf.float32,\n", - " initial_state=initial_state)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.dnc_cell, self.X_seq_len,self.encoder_state)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size=batch_size, dtype=tf.float32),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " output_time_major=False,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size=batch_size, dtype=tf.float32),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:Sonnet nest is deprecated. Please use tf.contrib.framework.nest instead. In addition, `map` is renamed to `map_structure`.\n", - "WARNING:tensorflow:From /home/husein/addressing.py:35: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "keep_dims is deprecated, use keepdims instead\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/numpy/lib/type_check.py:546: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead\n", - " 'a.item() instead', DeprecationWarning, stacklevel=1)\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.409067, avg accuracy: 0.069264\n", - "epoch: 2, avg loss: 5.859403, avg accuracy: 0.089123\n", - "epoch: 3, avg loss: 5.727355, avg accuracy: 0.105535\n", - "epoch: 4, avg loss: 5.574304, avg accuracy: 0.115628\n", - "epoch: 5, avg loss: 5.404255, avg accuracy: 0.125040\n", - "epoch: 6, avg loss: 5.201616, avg accuracy: 0.142058\n", - "epoch: 7, avg loss: 4.965463, avg accuracy: 0.156692\n", - "epoch: 8, avg loss: 4.678100, avg accuracy: 0.177471\n", - "epoch: 9, avg loss: 4.390165, avg accuracy: 0.191455\n", - "epoch: 10, avg loss: 4.087937, avg accuracy: 0.218103\n", - "epoch: 11, avg loss: 3.786239, avg accuracy: 0.255813\n", - "epoch: 12, avg loss: 3.450217, avg accuracy: 0.298548\n", - "epoch: 13, avg loss: 3.129698, avg accuracy: 0.346350\n", - "epoch: 14, avg loss: 2.765721, avg accuracy: 0.403250\n", - "epoch: 15, avg loss: 2.383927, avg accuracy: 0.466340\n", - "epoch: 16, avg loss: 2.102273, avg accuracy: 0.514097\n", - "epoch: 17, avg loss: 1.850025, avg accuracy: 0.560891\n", - "epoch: 18, avg loss: 1.609209, avg accuracy: 0.605419\n", - "epoch: 19, avg loss: 1.363634, avg accuracy: 0.664943\n", - "epoch: 20, avg loss: 1.131599, avg accuracy: 0.711184\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, (len(text_to) // batch_size) * batch_size, batch_size):\n", - " index = k+batch_size\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) // batch_size)\n", - " total_accuracy /= (len(text_to) // batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: and as you can imagine , i hated that moment of ripping with incredible intensity .\n", - "REAL ANSWER: vì tôi bị bỏng 70 % cơ thể nên mất khoảng 1 tiếng tháo băng .\n", - "PREDICTED ANSWER: vì tôi huýt gió cùng với cơn đau . \n", - "\n", - "row 2\n", - "QUESTION: and i would try to reason with them and say , " why don 't we try something else ?\n", - "REAL ANSWER: như bạn có thể tưởng tượng tôi căm ghét cái khoảnh khắc bóc toạc với 1 sức mạnh kinh hồn .\n", - "PREDICTED ANSWER: và 1 : 1 : bóc rất thích , " " do tốt khắc bộ phim không chưa bao giờ thường quên . \n", - "\n", - "row 3\n", - "QUESTION: why don 't we take it a little longer -- maybe two hours instead of an hour -- and have less of this intensity ? "\n", - "REAL ANSWER: và tôi sẽ cố gắng lý sự với họ " tại sao chúng ta không thử cách khác ? "\n", - "PREDICTED ANSWER: và người này vào , không phải đến mọi người không đi trẻ ? \n", - "\n", - "row 4\n", - "QUESTION: and the nurses told me two things .\n", - "REAL ANSWER: " tại sao chúng ta không làm lâu hơn 1 chút 2 tiếng thay vì 1 tiếng , và nhẹ tay hơn ? "\n", - "PREDICTED ANSWER: họ có thật lên chất tiếng tốt hơn . \n", - "\n", - "row 5\n", - "QUESTION: they told me that they had the right model of the patient -- that they knew what was the right thing to do to minimize my pain -- and they also told me that the word patient doesn 't mean to make suggestions or to interfere or ...\n", - "REAL ANSWER: và các y tá nói với tôi 2 điều .\n", - "PREDICTED ANSWER: và các y tá trở lại có các dòng lỗi . \n", - "\n", - "row 6\n", - "QUESTION: this is not just in hebrew , by the way .\n", - "REAL ANSWER: họ nói rằng mẫu bệnh nhân đúng mực là những người tin tưởng vào các y tá luôn thao tác đúng để giảm đau tối đa và họ cũng nói rằng bệnh nhân không nên gợi ý hay can thiệp , hoặc ...\n", - "PREDICTED ANSWER: ý nghĩa là những người này theo google , nhưng ngay khi diybio ra diybio . \n", - "\n", - "row 7\n", - "QUESTION: it 's in every language i 've had experience with so far .\n", - "REAL ANSWER: đây không phải bằng chữ hebrew\n", - "PREDICTED ANSWER: hôm nay , tôi phát hiện sự sáng sinh học của việc đó . \n", - "\n", - "row 8\n", - "QUESTION: and , you know , there 's not much -- there wasn 't much i could do , and they kept on doing what they were doing .\n", - "REAL ANSWER: nó bằng mọi thứ ngôn ngữ tôi từng biết\n", - "PREDICTED ANSWER: nếu tôi có một đôi thuyết vi , và điều đó thật sự tuyệt vời . \n", - "\n", - "row 9\n", - "QUESTION: and about three years later , when i left the hospital , i started studying at the university .\n", - "REAL ANSWER: và , bạn biết đấy , không có nhiều nhiều thứ tôi có thể làm và họ tiếp tục làm công việc của mình .\n", - "PREDICTED ANSWER: và nhiều khoa học huýt gió cùng với nếu chúng tôi tới khoa học vài mình và biết điều đó . \n", - "\n", - "row 10\n", - "QUESTION: and one of the most interesting lessons i learned was that there is an experimental method that if you have a question you can create a replica of this question in some abstract way , and you can try to examine this question , maybe learn something about the world .\n", - "REAL ANSWER: và khoảng 3 năm sau , khi tôi ra viện , tôi đã bắt đầu học đại học\n", - "PREDICTED ANSWER: và khoảng 3 năm sau , khi tôi ra viện , tôi bắt đầu giúp tôi bắt đầu học đại học và từ một dạng của một nhà sinh học và một dạng của một nhà sinh vật điều đó đã rất tuyệt . \n", - "\n", - "row 11\n", - "QUESTION: so that 's what i did .\n", - "REAL ANSWER: và 1 trong số các bài học thú vị nhất tôi đã học là phương pháp thử nghiệm nghĩa là nếu bạn nghi vấn điều gì , bạn có thể tạo 1 bản mô phỏng nghi vấn một cách trừu tượng , bạn có thể cố gắng kiểm tra nghi vấn , có thể học được chút gì về thế giới .\n", - "PREDICTED ANSWER: ý tôi không phải đến cơ bắp đây . \n", - "\n", - "row 12\n", - "QUESTION: i was still interested in this question of how do you take bandages off burn patients .\n", - "REAL ANSWER: đó là những gì tôi đã làm .\n", - "PREDICTED ANSWER: đó là một cả nghệ này . \n", - "\n", - "row 13\n", - "QUESTION: so originally i didn 't have much money , so i went to a hardware store and i bought a carpenter 's vice .\n", - "REAL ANSWER: tôi vẫn rất quan tâm đến câu hỏi làm cách nào để tháo băng y tế cho bệnh nhân bỏng .\n", - "PREDICTED ANSWER: tôi vẫn quan tâm học việc đọc học việc đọc nên vấn đề mà đó . \n", - "\n", - "row 14\n", - "QUESTION: and i would bring people to the lab and i would put their finger in it , and i would crunch it a little bit .\n", - "REAL ANSWER: ban đầu tôi không có nhiều tiền , vì thế tôi đã đến cửa hàng kim khí và mua 1 cái bàn kẹp thợ mộc .\n", - "PREDICTED ANSWER: ban đầu tôi không có nước ra nước vào lớp đó không . \n", - "\n", - "row 15\n", - "QUESTION: and i would crunch it for long periods and short periods , and pain that went up and pain that went down , and with breaks and without breaks -- all kinds of versions of pain .\n", - "REAL ANSWER: sau đó tôi mang mọi người tới phòng thí nhiệm , đặt ngón tay họ vào đó , và tôi kẹp họ 1 chút .\n", - "PREDICTED ANSWER: sau đó tôi tới 1 chút khắp nước như thế , họ có giai đoạn đó và thật sự rất tuyệt . \n", - "\n", - "row 16\n", - "QUESTION: and when i finished hurting people a little bit , i would ask them , so , how painful was this ? or , how painful was this ?\n", - "REAL ANSWER: và tôi kẹp trong 1 khoảng thời gian dài và ngắn , cơn đau lúc tăng lúc giảm , có lúc nghỉ ngơi và có lúc không- tất cả các mức độ đau đớn .\n", - "PREDICTED ANSWER: và tôi kẹp 1 khoảng lúc đó tôi nghe lúc đó , và tôi cố thấm nước như thế nào và tôi sẽ thích sự giải sáng và ngoài điều đó . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic.ipynb b/neural-machine-translation/40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic.ipynb new file mode 100644 index 0000000..daf711f --- /dev/null +++ b/neural-machine-translation/40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic.ipynb @@ -0,0 +1,746 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cells(size_layer = size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongMonotonicAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse=reuse) for _ in range(num_layers)]), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " encoder_out = tf.nn.embedding_lookup(embeddings, self.X)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_out,\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_out = tf.concat((out_fw, out_bw), 2)\n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " encoder_state = tuple([bi_state] * num_layers)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + "\n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_out, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " decoder_cell = attention(encoder_out_tiled, X_seq_len_tiled, reuse=True)\n", + " states = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(\n", + " cell_state = encoder_state_tiled)\n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = decoder_cell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = states,\n", + " beam_width = beam_width,\n", + " output_layer = dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :19: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/beam_search_decoder.py:971: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[25234, 25234, 25234, 25234, 25147, 25147, 25147, 25147, 25147,\n", + " 19416, 19416, 4777, 4777, 1805, 1805, 1805, 1805, 1805,\n", + " 28285, 14894, 14894, 14894, 14894, 30741, 30741, 15395, 15395,\n", + " 28650, 28650, 28650, 1803, 1803, 1803, 1803, 4283, 4283,\n", + " 4283, 22741, 22741, 22741, 22741, 22741, 13653, 13653, 19923,\n", + " 11720, 19923, 19923, 19923, 19923, 19923, 19923, 11645, 11645,\n", + " 11645, 11645, 18418, 18418, 18418, 18418, 18418, 15632, 15632,\n", + " 15632, 23234, 23234, 23234, 31936, 31936, 31936, 31936, 31936],\n", + " [14797, 23849, 23849, 23849, 28739, 18292, 18292, 18292, 18292,\n", + " 31408, 2628, 12087, 22731, 22731, 22731, 22731, 18471, 18471,\n", + " 2307, 2307, 2307, 2307, 812, 812, 812, 812, 812,\n", + " 12616, 12616, 12616, 12616, 12616, 12616, 12616, 3091, 3091,\n", + " 3091, 15589, 21929, 21929, 3654, 3654, 3654, 3654, 3654,\n", + " 3654, 3654, 24059, 24059, 24059, 24059, 21467, 21467, 21467,\n", + " 21467, 21467, 21467, 21467, 21467, 18344, 23832, 23832, 23832,\n", + " 23832, 23832, 23832, 23832, 23832, 23832, 23832, 27046, 30740],\n", + " [14848, 14848, 7332, 7332, 28455, 28455, 28455, 3283, 3283,\n", + " 2013, 2013, 2013, 2013, 2013, 2013, 2013, 4850, 4850,\n", + " 4850, 4850, 16852, 16852, 16852, 16852, 17203, 25066, 25066,\n", + " 25066, 7375, 10444, 18465, 10444, 19552, 12758, 12758, 12758,\n", + " 12758, 30175, 30175, 30175, 17616, 17616, 17616, 23678, 940,\n", + " 940, 23678, 13394, 13394, 13394, 13394, 13394, 13394, 13394,\n", + " 13394, 21408, 21408, 21408, 21408, 21408, 21408, 18068, 18068,\n", + " 18068, 29317, 29321, 29321, 29321, 29321, 29321, 370, 370],\n", + " [24120, 27115, 27115, 27115, 9553, 29103, 29103, 29103, 29103,\n", + " 29103, 29103, 29103, 29103, 29103, 29103, 7682, 7682, 7682,\n", + " 7682, 7682, 7682, 7682, 17343, 28744, 1320, 3740, 1320,\n", + " 3740, 3740, 3740, 3740, 3740, 3740, 3740, 10404, 10404,\n", + " 23672, 21159, 21159, 21159, 21159, 21159, 11806, 9787, 9787,\n", + " 9787, 13756, 13756, 13756, 13756, 13756, 13756, 13756, 3411,\n", + " 3411, 3411, 15244, 15244, 15244, 15244, 15244, 15244, 15244,\n", + " 1383, 1383, 24905, 24905, 24905, 24905, 24905, 18580, 15406],\n", + " [26720, 24727, 24727, 24727, 30888, 30888, 30888, 30888, 30888,\n", + " 25127, 24731, 21808, 21808, 21808, 21808, 5687, 5687, 5687,\n", + " 5687, 5687, 5687, 4290, 4290, 4290, 4290, 26784, 26784,\n", + " 26784, 26784, 8646, 8646, 8646, 24987, 24987, 24987, 26429,\n", + " 26429, 19665, 19665, 19665, 19665, 20132, 20132, 19025, 19025,\n", + " 19025, 31997, 15658, 4507, 4507, 4507, 4507, 20698, 20698,\n", + " 20698, 20698, 20698, 3458, 3458, 3458, 30088, 30088, 13561,\n", + " 28484, 13561, 13561, 4802, 11670, 11670, 20468, 20468, 11857],\n", + " [ 2897, 2897, 3903, 16822, 16822, 16822, 12873, 12873, 12873,\n", + " 28086, 28086, 28086, 28086, 17142, 17142, 17142, 17142, 4315,\n", + " 4315, 4315, 5000, 5000, 5000, 5000, 17839, 17839, 6157,\n", + " 3057, 3057, 8726, 8726, 8726, 27976, 27976, 27976, 27976,\n", + " 27976, 16212, 17763, 17763, 17763, 17763, 17763, 17763, 30227,\n", + " 30109, 30109, 30109, 30109, 4367, 1336, 1336, 1336, 1336,\n", + " 1336, 1336, 1336, 1336, 27333, 27333, 27333, 27333, 27333,\n", + " 27333, 18762, 25608, 25608, 25608, 25608, 25608, 18762, 26089],\n", + " [ 7373, 7373, 7373, 10320, 10320, 10320, 730, 730, 730,\n", + " 6795, 22740, 22740, 22740, 22740, 22740, 22740, 24252, 24252,\n", + " 24252, 7706, 7706, 7706, 18645, 18645, 5131, 5131, 5131,\n", + " 23748, 1972, 1972, 1972, 6099, 6099, 1386, 1386, 7538,\n", + " 7538, 370, 370, 370, 370, 370, 20606, 20606, 20606,\n", + " 20606, 20606, 21380, 21380, 21380, 21380, 21380, 21380, 15450,\n", + " 15450, 8617, 8617, 8617, 15360, 15360, 30902, 30902, 12284,\n", + " 12284, 12284, 12284, 16301, 16301, 16301, 16301, 16301, 16301],\n", + " [21004, 10456, 10456, 10456, 6483, 6483, 14991, 14991, 25902,\n", + " 3927, 3927, 3927, 3927, 5534, 5534, 5534, 5534, 8793,\n", + " 856, 856, 856, 7059, 7059, 7059, 7059, 7059, 19732,\n", + " 19732, 11877, 11877, 11877, 3492, 3492, 3492, 387, 387,\n", + " 387, 6076, 6076, 6076, 6076, 25249, 25249, 904, 1026,\n", + " 1026, 1026, 1026, 1026, 1026, 1026, 15494, 15494, 15494,\n", + " 1725, 1725, 1725, 1725, 12929, 12929, 12929, 12929, 12929,\n", + " 23991, 23991, 23991, 19689, 19689, 19689, 7579, 12843, 12843],\n", + " [13979, 13979, 7112, 7112, 7112, 7112, 7112, 7112, 5904,\n", + " 5904, 5904, 8328, 8328, 8328, 8328, 8328, 8328, 10453,\n", + " 27401, 8803, 8803, 8803, 8795, 20232, 20232, 20232, 20232,\n", + " 20232, 10088, 10088, 20232, 20232, 20232, 20232, 20232, 1934,\n", + " 1934, 1934, 19639, 19639, 12584, 12584, 12584, 12584, 12584,\n", + " 12584, 13290, 13290, 13290, 13290, 19151, 19151, 19151, 19151,\n", + " 19151, 6545, 23261, 23261, 30768, 30768, 30768, 25014, 28074,\n", + " 25014, 25014, 25014, 1222, 1222, 20054, 20054, 9242, 9242],\n", + " [11013, 11013, 9679, 5892, 328, 328, 328, 328, 328,\n", + " 2751, 2751, 2751, 5538, 5538, 2199, 3244, 3244, 3244,\n", + " 3244, 3244, 28821, 28821, 5214, 5214, 5214, 5214, 5214,\n", + " 31146, 5218, 5218, 5218, 5218, 5218, 5218, 4383, 4383,\n", + " 4383, 4383, 27831, 27831, 29018, 29018, 24237, 27614, 27614,\n", + " 27614, 6158, 6158, 6158, 6158, 6158, 6158, 11853, 11853,\n", + " 11853, 25252, 25252, 26890, 26890, 26890, 22828, 22828, 22828,\n", + " 682, 682, 682, 682, 15504, 15504, 15504, 15504, 23702]],\n", + " dtype=int32), 10.374842, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [14:11<00:00, 1.84it/s, accuracy=0.24, cost=4.8] \n", + "minibatch loop: 100%|██████████| 40/40 [00:10<00:00, 3.81it/s, accuracy=0.29, cost=4.02] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.065290846" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/41.lstm-birnn-seq2seq-beam-luongmonotic.ipynb b/neural-machine-translation/41.lstm-birnn-seq2seq-beam-luongmonotic.ipynb deleted file mode 100644 index 020b90d..0000000 --- a/neural-machine-translation/41.lstm-birnn-seq2seq-beam-luongmonotic.ipynb +++ /dev/null @@ -1,439 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['GO', 0], ['PAD', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " self.encoder_out = tf.concat((out_fw, out_bw), 2)\n", - " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", - " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongMonotonicAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer, reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cell,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(cell_state = encoder_state_tiled),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.733504, avg accuracy: 0.570861\n", - "epoch: 2, avg loss: 2.914620, avg accuracy: 0.592852\n", - "epoch: 3, avg loss: 2.742152, avg accuracy: 0.597090\n", - "epoch: 4, avg loss: 2.699837, avg accuracy: 0.597512\n", - "epoch: 5, avg loss: 2.657858, avg accuracy: 0.599251\n", - "epoch: 6, avg loss: 2.620675, avg accuracy: 0.602520\n", - "epoch: 7, avg loss: 2.587916, avg accuracy: 0.605443\n", - "epoch: 8, avg loss: 2.566842, avg accuracy: 0.608643\n", - "epoch: 9, avg loss: 2.552954, avg accuracy: 0.606263\n", - "epoch: 10, avg loss: 2.536597, avg accuracy: 0.607675\n", - "epoch: 11, avg loss: 2.506966, avg accuracy: 0.610280\n", - "epoch: 12, avg loss: 2.467059, avg accuracy: 0.614517\n", - "epoch: 13, avg loss: 2.439504, avg accuracy: 0.617566\n", - "epoch: 14, avg loss: 2.465031, avg accuracy: 0.614117\n", - "epoch: 15, avg loss: 2.451032, avg accuracy: 0.620523\n", - "epoch: 16, avg loss: 2.439664, avg accuracy: 0.618421\n", - "epoch: 17, avg loss: 2.421476, avg accuracy: 0.618355\n", - "epoch: 18, avg loss: 2.383590, avg accuracy: 0.621968\n", - "epoch: 19, avg loss: 2.345252, avg accuracy: 0.622283\n", - "epoch: 20, avg loss: 2.326693, avg accuracy: 0.624756\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: nhưng tôi không , , , tôi tôi , tôi , tôi , , tôi , , không không . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: tôi tôi , , , , tôi tôi , tôi , tôi . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: nhưng tôi , không , tôi tôi tôi \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: đó đó đó đó , giống giống , \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/41.residual-lstm-seq2seq-greedy-luong.ipynb b/neural-machine-translation/41.residual-lstm-seq2seq-greedy-luong.ipynb new file mode 100644 index 0000000..94128c2 --- /dev/null +++ b/neural-machine-translation/41.residual-lstm-seq2seq-greedy-luong.ipynb @@ -0,0 +1,841 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cell(size, residual, reuse=False):\n", + " c = tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " if residual:\n", + " c = tf.nn.rnn_cell.ResidualWrapper(c)\n", + " return c\n", + " \n", + " def cells(size = size_layer, residual = 1, reuse=False):\n", + " cell_list = []\n", + " for i in range(num_layers):\n", + " cell_list.append(cell(size, i >= residual, reuse=reuse))\n", + " return cell_list\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells(reuse=reuse)), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells()), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :42: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :45: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[29047, 30444, 14561, 2230, 5786, 26561, 31618, 31001, 31001,\n", + " 17966, 2667, 18957, 28548, 31477, 31477, 11185, 24211, 14213,\n", + " 31284, 22522, 22781, 29498, 29498, 25449, 7342, 7342, 9649,\n", + " 9649, 19618, 9649, 9649, 19618, 19721, 9649, 17234, 17234,\n", + " 29692, 29220, 18658, 31931, 18658, 918, 7650, 7650, 918,\n", + " 12554, 12554, 7166, 7166, 10922, 10922, 19621, 9684, 26497,\n", + " 11469, 11469, 8126, 11469, 29371, 3538, 7402, 31331, 4123,\n", + " 4626, 3462, 24827, 21869, 22296, 16138, 1841, 30793, 27284],\n", + " [ 4497, 22485, 22485, 7264, 7264, 27848, 9481, 7264, 7627,\n", + " 12491, 9481, 12491, 60, 60, 952, 950, 7049, 12491,\n", + " 11850, 12530, 8881, 6790, 24012, 17025, 22374, 30923, 15264,\n", + " 26414, 10909, 10610, 10610, 10610, 10610, 30261, 3202, 30694,\n", + " 30694, 27625, 12202, 5254, 10782, 602, 25117, 28548, 3751,\n", + " 26891, 2449, 23919, 23919, 30125, 788, 788, 788, 23986,\n", + " 13873, 4432, 7699, 16419, 5933, 10076, 15445, 7699, 8051,\n", + " 8051, 8051, 1001, 8051, 6632, 6632, 24899, 4096, 30416],\n", + " [27600, 31046, 15655, 10677, 27485, 17750, 23156, 29801, 6619,\n", + " 3613, 20412, 12057, 2108, 21962, 11808, 16883, 16883, 23269,\n", + " 19531, 19531, 28856, 4047, 24010, 20950, 4507, 4811, 11449,\n", + " 11449, 24507, 24507, 6381, 5922, 17041, 11177, 9716, 27829,\n", + " 8384, 8384, 264, 8061, 8061, 8061, 15063, 15756, 10785,\n", + " 15756, 22878, 7502, 15996, 19184, 5900, 4267, 6479, 15946,\n", + " 20950, 1893, 30627, 24507, 24507, 24507, 4400, 20908, 21159,\n", + " 21159, 21159, 19483, 10677, 28432, 3372, 10534, 28432, 18325],\n", + " [12203, 18996, 12409, 3728, 28713, 974, 16122, 31624, 7188,\n", + " 3281, 5512, 5512, 20937, 3771, 20468, 25068, 30387, 4656,\n", + " 15289, 13059, 31830, 24046, 30315, 30315, 26249, 25746, 18810,\n", + " 29699, 9023, 9362, 9023, 11538, 31371, 16471, 16471, 7921,\n", + " 26414, 31371, 24578, 22878, 15967, 28207, 9362, 12104, 12104,\n", + " 21523, 6658, 10229, 22323, 31734, 13204, 10534, 4237, 28200,\n", + " 28200, 6381, 28200, 7821, 3177, 26065, 28200, 30670, 28200,\n", + " 16000, 16000, 20426, 1575, 15335, 30038, 30038, 3336, 27314],\n", + " [13686, 17492, 11617, 11874, 2976, 2976, 20012, 20012, 20012,\n", + " 1388, 12356, 8616, 22590, 27891, 398, 9669, 27891, 9669,\n", + " 30034, 30034, 27712, 29167, 29167, 26446, 29167, 26446, 29167,\n", + " 16921, 11874, 16921, 11874, 2976, 2976, 20768, 26796, 398,\n", + " 14140, 8675, 3642, 11806, 14191, 5071, 7559, 29590, 5071,\n", + " 21494, 25894, 17058, 25894, 15782, 10105, 15782, 8381, 8381,\n", + " 7026, 13375, 9622, 13375, 13375, 12285, 12285, 26249, 8521,\n", + " 12090, 9165, 21815, 12078, 12078, 19783, 25869, 10050, 13126],\n", + " [22945, 14731, 11178, 3654, 20781, 26710, 16267, 29306, 10982,\n", + " 28381, 8969, 291, 29901, 21452, 21452, 21224, 14876, 14365,\n", + " 14876, 14365, 20859, 17800, 16687, 19731, 28533, 28533, 28533,\n", + " 28533, 28533, 6168, 22572, 5254, 291, 291, 11386, 17170,\n", + " 13800, 15713, 18884, 9239, 14699, 14699, 29769, 1522, 14699,\n", + " 29769, 1522, 482, 14570, 14570, 13912, 25764, 30482, 30987,\n", + " 20567, 22999, 10553, 18381, 27172, 7170, 22945, 26476, 26476,\n", + " 8434, 8434, 19247, 15831, 15831, 30940, 31544, 16791, 9225],\n", + " [ 3466, 25681, 208, 157, 29323, 1590, 25711, 8719, 3466,\n", + " 23595, 28953, 11591, 19537, 21211, 25979, 13889, 7861, 18367,\n", + " 4498, 12837, 4498, 12837, 18798, 12837, 13188, 31843, 31843,\n", + " 11664, 1982, 7322, 7322, 5844, 5844, 3576, 3576, 3695,\n", + " 7694, 1434, 17334, 14267, 13756, 2447, 20441, 22909, 22909,\n", + " 499, 19522, 499, 19522, 16526, 2098, 16526, 21833, 9075,\n", + " 16526, 27592, 2153, 27592, 1217, 16403, 6350, 24486, 5396,\n", + " 5396, 6641, 7024, 12966, 14725, 15103, 21713, 2293, 15776],\n", + " [22319, 18195, 20505, 8671, 12086, 7314, 26546, 9895, 19457,\n", + " 13969, 29995, 12627, 12627, 17109, 17109, 17109, 26245, 23099,\n", + " 13119, 2844, 13119, 17261, 25674, 25674, 19698, 550, 25720,\n", + " 6291, 29417, 13609, 29417, 4896, 1632, 2591, 9692, 25012,\n", + " 25956, 6682, 6682, 27489, 27393, 9788, 22855, 4886, 30813,\n", + " 4896, 30813, 26792, 24484, 20746, 15829, 31568, 11678, 21187,\n", + " 30125, 19829, 28550, 11747, 11747, 22008, 11747, 25970, 22008,\n", + " 29645, 4829, 18421, 6507, 10083, 1061, 2219, 2219, 2777],\n", + " [ 95, 28921, 10637, 1132, 7276, 20128, 20128, 5244, 15624,\n", + " 14703, 15624, 17716, 17716, 2400, 2400, 31210, 14543, 7864,\n", + " 19929, 30050, 14663, 4585, 4585, 3062, 6381, 6381, 42,\n", + " 6511, 6511, 23585, 28268, 3182, 7383, 19835, 19835, 4645,\n", + " 2983, 12526, 839, 17420, 321, 3637, 3637, 3637, 4560,\n", + " 3637, 4560, 3637, 4560, 2799, 14897, 42, 11564, 17399,\n", + " 448, 10749, 10749, 21618, 4630, 25664, 20977, 20015, 6139,\n", + " 25811, 24246, 25811, 21178, 30695, 30695, 18636, 21255, 1109],\n", + " [26750, 4249, 25269, 16495, 16495, 15917, 28768, 29954, 27775,\n", + " 12419, 109, 23109, 15289, 19383, 16502, 16502, 16502, 18788,\n", + " 16323, 25332, 8173, 17275, 5936, 26782, 746, 3718, 22759,\n", + " 26978, 15025, 14553, 14553, 31660, 29277, 24142, 1462, 4841,\n", + " 4841, 8717, 27579, 29360, 22154, 22154, 30439, 30439, 30439,\n", + " 31972, 25211, 24329, 16938, 14171, 16938, 23528, 30439, 30439,\n", + " 4501, 31972, 16938, 6580, 29447, 23528, 30439, 30439, 6793,\n", + " 15412, 28569, 28569, 25183, 25183, 18715, 25183, 2876, 2876]],\n", + " dtype=int32), 10.373421, 0.0]" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:57<00:00, 2.38it/s, accuracy=0.33, cost=4.1] \n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.86it/s, accuracy=0.371, cost=3.73]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.1475228" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/42.lstm-birnn-seq2seq-beam-bahdanaumonotic.ipynb b/neural-machine-translation/42.lstm-birnn-seq2seq-beam-bahdanaumonotic.ipynb deleted file mode 100644 index 248be2d..0000000 --- a/neural-machine-translation/42.lstm-birnn-seq2seq-beam-bahdanaumonotic.ipynb +++ /dev/null @@ -1,439 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['GO', 0], ['PAD', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, batch_size,\n", - " grad_clip=5.0, beam_width=5, force_teaching_ratio=0.5):\n", - " \n", - " def lstm_cell(size, reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size, initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " self.encoder_out = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = lstm_cell(size_layer // 2),\n", - " cell_bw = lstm_cell(size_layer // 2),\n", - " inputs = self.encoder_out,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", - " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " encoder_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauMonotonicAttention(\n", - " num_units = size_layer, \n", - " memory = self.encoder_out,\n", - " memory_sequence_length = self.X_seq_len)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " embedding = decoder_embeddings,\n", - " sampling_probability = 1 - force_teaching_ratio,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = tf.layers.Dense(to_dict_size))\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(self.encoder_out, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " decoder_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell(size_layer, reuse=True) for _ in range(num_layers)]),\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = decoder_cell,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = decoder_cell.zero_state(batch_size * beam_width, tf.float32).clone(cell_state = encoder_state_tiled),\n", - " beam_width = beam_width,\n", - " output_layer = tf.layers.Dense(to_dict_size, _reuse=True),\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", - " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), batch_size,learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.718877, avg accuracy: 0.573021\n", - "epoch: 2, avg loss: 2.899292, avg accuracy: 0.596689\n", - "epoch: 3, avg loss: 2.711875, avg accuracy: 0.598600\n", - "epoch: 4, avg loss: 2.642994, avg accuracy: 0.602767\n", - "epoch: 5, avg loss: 2.607828, avg accuracy: 0.605380\n", - "epoch: 6, avg loss: 2.573920, avg accuracy: 0.606315\n", - "epoch: 7, avg loss: 2.562106, avg accuracy: 0.607669\n", - "epoch: 8, avg loss: 2.564469, avg accuracy: 0.606702\n", - "epoch: 9, avg loss: 2.560219, avg accuracy: 0.606772\n", - "epoch: 10, avg loss: 2.503828, avg accuracy: 0.611658\n", - "epoch: 11, avg loss: 2.490917, avg accuracy: 0.612809\n", - "epoch: 12, avg loss: 2.453784, avg accuracy: 0.616815\n", - "epoch: 13, avg loss: 2.450262, avg accuracy: 0.616321\n", - "epoch: 14, avg loss: 2.434511, avg accuracy: 0.620291\n", - "epoch: 15, avg loss: 2.401059, avg accuracy: 0.621035\n", - "epoch: 16, avg loss: 2.359351, avg accuracy: 0.625794\n", - "epoch: 17, avg loss: 2.309210, avg accuracy: 0.630723\n", - "epoch: 18, avg loss: 2.275743, avg accuracy: 0.630880\n", - "epoch: 19, avg loss: 2.235807, avg accuracy: 0.632229\n", - "epoch: 20, avg loss: 2.166886, avg accuracy: 0.639045\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau không , , , , , , , , , , , , . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu nếu chọn một ở ở ở , , , . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tôi làm thí , \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: sau sau đó đó giống giống tài tài tài tài tài , , , , , , , , , , , , , , \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/42.residual-gru-seq2seq-greedy-luong.ipynb b/neural-machine-translation/42.residual-gru-seq2seq-greedy-luong.ipynb new file mode 100644 index 0000000..5205c4e --- /dev/null +++ b/neural-machine-translation/42.residual-gru-seq2seq-greedy-luong.ipynb @@ -0,0 +1,836 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cell(size, residual, reuse=False):\n", + " c = tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", + " if residual:\n", + " c = tf.nn.rnn_cell.ResidualWrapper(c)\n", + " return c\n", + " \n", + " def cells(size = size_layer, residual = 1, reuse=False):\n", + " cell_list = []\n", + " for i in range(num_layers):\n", + " cell_list.append(cell(size, i >= residual, reuse=reuse))\n", + " return cell_list\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells(reuse=reuse)), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells()), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :42: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :45: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 37, 13009, 13009, 12879, 6839, 7171, 7171, 7171, 7171,\n", + " 7171, 7171, 7171, 21304, 22121, 37, 37, 37, 27938,\n", + " 27938, 27938, 28560, 28560, 28560, 28560, 29665, 4627, 26103,\n", + " 26103, 23145, 21377, 1881, 1881, 1881, 13215, 26103, 19720,\n", + " 26103, 16690, 17704, 12269, 12269, 12269, 31881, 31881, 31881,\n", + " 5813, 5813, 8109, 8109, 2338, 2375, 2375, 2375, 3312,\n", + " 15435, 9656, 15435, 9656, 30516, 19282, 19282, 1878, 1878,\n", + " 22523, 647, 19239, 29303, 647, 19384, 19384, 19384, 13459],\n", + " [27556, 557, 5648, 5648, 14805, 496, 3022, 3022, 11803,\n", + " 11803, 20709, 31037, 31037, 8552, 11767, 11767, 11767, 31806,\n", + " 31806, 6223, 3358, 4836, 3358, 4836, 1348, 27445, 27445,\n", + " 27445, 10765, 7656, 7656, 6202, 1315, 19911, 19911, 19911,\n", + " 3023, 19911, 1494, 2802, 29319, 12890, 17652, 2802, 20040,\n", + " 18043, 26183, 26183, 26183, 24737, 24737, 19702, 19702, 21633,\n", + " 3683, 19702, 6268, 23809, 6268, 6268, 6268, 23328, 6268,\n", + " 6268, 6268, 23328, 6268, 6268, 23328, 6268, 6268, 23328],\n", + " [11697, 25656, 19509, 19509, 8367, 8367, 21622, 24611, 25656,\n", + " 11596, 31850, 31850, 5482, 31850, 29281, 5558, 5558, 30875,\n", + " 20055, 31850, 21962, 21962, 9374, 25681, 19029, 9690, 29405,\n", + " 3850, 27872, 27872, 27872, 27872, 23887, 5558, 5558, 2304,\n", + " 2304, 2304, 27872, 1052, 221, 221, 221, 25748, 17863,\n", + " 17863, 17863, 14477, 23687, 24772, 24772, 21531, 23703, 1542,\n", + " 1542, 13161, 8669, 1542, 1542, 10474, 29642, 345, 16359,\n", + " 16359, 1542, 20547, 4430, 4430, 4430, 9344, 5482, 5482],\n", + " [21394, 27593, 5956, 20801, 20801, 1121, 2000, 2000, 2000,\n", + " 30541, 6713, 6713, 6713, 6713, 6713, 6713, 6713, 6713,\n", + " 20237, 22015, 22015, 6713, 6713, 6713, 6713, 19347, 7776,\n", + " 20828, 20828, 20828, 20828, 29983, 31758, 31758, 31758, 28447,\n", + " 8159, 8159, 8159, 8159, 8159, 8159, 8159, 2084, 4387,\n", + " 4387, 22845, 8159, 2084, 8159, 8159, 2084, 8159, 2084,\n", + " 8159, 2084, 8159, 18880, 4083, 4083, 4083, 4083, 4083,\n", + " 4083, 4083, 11567, 4083, 22687, 13735, 13735, 30318, 13735],\n", + " [15273, 22242, 23385, 11381, 422, 4382, 4382, 17712, 17712,\n", + " 23275, 27700, 28845, 28845, 14008, 3213, 3213, 18995, 18995,\n", + " 22317, 26495, 26495, 8489, 8489, 28745, 26744, 4815, 28186,\n", + " 28186, 12297, 4936, 4936, 4936, 18040, 16508, 472, 18842,\n", + " 4633, 11943, 11943, 26941, 26941, 2017, 2017, 2017, 2017,\n", + " 2017, 2017, 2017, 14846, 14846, 14846, 18842, 18842, 12763,\n", + " 12763, 12763, 28419, 12763, 12763, 28419, 12763, 9243, 3505,\n", + " 31169, 31169, 31169, 31169, 31169, 31169, 14671, 12700, 8610],\n", + " [24520, 29348, 11931, 21709, 31484, 26354, 31484, 29565, 3070,\n", + " 29429, 28766, 30589, 1668, 1668, 1668, 1668, 1668, 25289,\n", + " 25289, 17613, 25289, 5147, 3685, 3685, 25155, 5988, 5988,\n", + " 5988, 26992, 26992, 17186, 5988, 26992, 26992, 6181, 6181,\n", + " 6181, 6181, 14493, 6181, 14493, 1777, 4146, 1777, 6406,\n", + " 6406, 25359, 25359, 25359, 4540, 7274, 19845, 25359, 25359,\n", + " 7274, 22866, 22866, 13361, 24435, 30348, 3604, 3604, 14427,\n", + " 22634, 3004, 22634, 3004, 3007, 6635, 5363, 3004, 3007],\n", + " [ 2951, 7098, 1964, 27086, 27086, 675, 18800, 6107, 22157,\n", + " 22157, 30604, 18097, 7059, 21811, 21811, 29265, 29265, 15823,\n", + " 8211, 8211, 13208, 13208, 13208, 13208, 11948, 28970, 19054,\n", + " 21811, 6703, 9544, 13663, 16477, 31445, 16477, 30985, 15766,\n", + " 15766, 19407, 30731, 30731, 30731, 25434, 25434, 25434, 25434,\n", + " 25434, 25434, 25434, 25434, 1793, 10750, 18029, 14655, 14655,\n", + " 10200, 13825, 13825, 25780, 6119, 11249, 7245, 19647, 19647,\n", + " 11249, 30859, 30859, 30859, 27677, 23058, 18222, 15269, 15269],\n", + " [20258, 8702, 10705, 10705, 18800, 4373, 21454, 5557, 1435,\n", + " 7948, 17996, 439, 25223, 10016, 28819, 18599, 26296, 22377,\n", + " 22377, 15721, 31291, 31291, 31291, 28054, 17321, 4735, 8799,\n", + " 3162, 14570, 12965, 5183, 5464, 17094, 5183, 17094, 2182,\n", + " 2182, 2182, 2182, 9, 7679, 7679, 28177, 25344, 14900,\n", + " 25344, 14900, 14900, 30556, 30556, 30556, 30556, 17174, 25946,\n", + " 25946, 2234, 25946, 23096, 22227, 22227, 22227, 22227, 15448,\n", + " 15448, 15448, 8448, 8448, 8448, 8448, 8448, 8448, 8448],\n", + " [ 9717, 9717, 6382, 11767, 11767, 10670, 10296, 2939, 5557,\n", + " 26899, 5557, 1435, 11767, 11767, 11767, 3009, 9416, 9416,\n", + " 19831, 31637, 29692, 21637, 29692, 6382, 29692, 29692, 21637,\n", + " 18158, 29692, 21637, 18158, 29692, 29692, 20742, 31866, 29692,\n", + " 29692, 21637, 12486, 23847, 29751, 4401, 5482, 22170, 12486,\n", + " 6683, 5482, 30940, 30940, 25510, 25510, 525, 25510, 19912,\n", + " 21604, 6683, 21604, 6683, 3213, 3213, 3213, 1702, 1702,\n", + " 1702, 1702, 19491, 1702, 19491, 30525, 30525, 6892, 5167],\n", + " [31012, 23454, 23454, 15517, 16873, 15577, 29436, 502, 1869,\n", + " 31404, 11958, 11958, 31588, 30280, 14949, 17377, 30280, 30280,\n", + " 14949, 10317, 17377, 24187, 30280, 30280, 25336, 25336, 25336,\n", + " 25336, 24683, 9149, 9149, 9149, 15684, 15684, 29905, 29905,\n", + " 31837, 28415, 19309, 21314, 21314, 25681, 25681, 26914, 26914,\n", + " 29905, 12774, 12774, 12774, 12774, 17244, 12774, 17244, 27028,\n", + " 27028, 7959, 5234, 7657, 7571, 617, 7571, 25114, 6202,\n", + " 10941, 6202, 305, 4521, 27234, 13101, 4521, 10581, 10581]],\n", + " dtype=int32), 10.372574, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:44<00:00, 2.43it/s, accuracy=0.296, cost=4.37]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.89it/s, accuracy=0.339, cost=3.72]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5.0574585e-05" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/43.memory-network-basic.ipynb b/neural-machine-translation/43.memory-network-basic.ipynb deleted file mode 100644 index e7d47d3..0000000 --- a/neural-machine-translation/43.memory-network-basic.ipynb +++ /dev/null @@ -1,463 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def hop_forward(memory_o, memory_i, response_proj, inputs_len, questions_len):\n", - " match = memory_i\n", - " match = pre_softmax_masking(match, inputs_len)\n", - " match = tf.nn.softmax(match)\n", - " match = post_softmax_masking(match, questions_len)\n", - " response = tf.multiply(match, memory_o)\n", - " return response_proj(response)\n", - "\n", - "\n", - "def pre_softmax_masking(x, seq_len):\n", - " paddings = tf.fill(tf.shape(x), float('-inf'))\n", - " T = tf.shape(x)[1]\n", - " max_seq_len = tf.shape(x)[2]\n", - " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", - " masks = tf.tile(tf.expand_dims(masks, 1), [1, T, 1])\n", - " return tf.where(tf.equal(masks, 0), paddings, x)\n", - "\n", - "\n", - "def post_softmax_masking(x, seq_len):\n", - " T = tf.shape(x)[2]\n", - " max_seq_len = tf.shape(x)[1]\n", - " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", - " masks = tf.tile(tf.expand_dims(masks, -1), [1, 1, T])\n", - " return x * masks\n", - "\n", - "\n", - "def shift_right(x):\n", - " batch_size = tf.shape(x)[0]\n", - " start = tf.to_int32(tf.fill([batch_size, 1], GO))\n", - " return tf.concat([start, x[:, :-1]], 1)\n", - "\n", - "\n", - "def embed_seq(x, vocab_size, zero_pad = True):\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table', [vocab_size, size_layer], tf.float32\n", - " )\n", - " if zero_pad:\n", - " lookup_table = tf.concat(\n", - " (tf.zeros([1, size_layer]), lookup_table[1:, :]), axis = 0\n", - " )\n", - " return tf.nn.embedding_lookup(lookup_table, x)\n", - "\n", - "\n", - "def position_encoding(sentence_size, embedding_size):\n", - " encoding = np.ones((embedding_size, sentence_size), dtype = np.float32)\n", - " ls = sentence_size + 1\n", - " le = embedding_size + 1\n", - " for i in range(1, le):\n", - " for j in range(1, ls):\n", - " encoding[i - 1, j - 1] = (i - (le - 1) / 2) * (j - (ls - 1) / 2)\n", - " encoding = 1 + 4 * encoding / embedding_size / sentence_size\n", - " return np.transpose(encoding)\n", - "\n", - "def quest_mem(x, vocab_size, max_quest_len):\n", - " x = embed_seq(x, vocab_size)\n", - " pos = position_encoding(max_quest_len, size_layer)\n", - " return x * pos\n", - "\n", - "class QA:\n", - " def __init__(self, vocab_size_from, vocab_size_to, size_layer, learning_rate, n_hops = 3):\n", - " self.X = tf.placeholder(tf.int32,[None,None])\n", - " self.Y = tf.placeholder(tf.int32,[None,None])\n", - " self.X_seq_len = tf.fill([tf.shape(self.X)[0]],maxlen_question)\n", - " self.Y_seq_len = tf.fill([tf.shape(self.X)[0]],maxlen_answer)\n", - " max_quest_len = maxlen_question\n", - " max_answer_len = maxlen_answer\n", - " \n", - " lookup_table = tf.get_variable('lookup_table', [vocab_size_from, size_layer], tf.float32)\n", - " \n", - " with tf.variable_scope('memory_o'):\n", - " memory_o = quest_mem(self.X, vocab_size_from, max_quest_len)\n", - " \n", - " with tf.variable_scope('memory_i'):\n", - " memory_i = quest_mem(self.X, vocab_size_from, max_quest_len)\n", - " \n", - " with tf.variable_scope('interaction'):\n", - " response_proj = tf.layers.Dense(size_layer)\n", - " for _ in range(n_hops):\n", - " answer = hop_forward(memory_o,\n", - " memory_i,\n", - " response_proj,\n", - " self.X_seq_len,\n", - " self.X_seq_len)\n", - " memory_i = answer\n", - " \n", - " embedding = tf.Variable(tf.random_uniform([vocab_size_to, size_layer], -1, 1))\n", - " cell = tf.nn.rnn_cell.BasicRNNCell(size_layer)\n", - " vocab_proj = tf.layers.Dense(vocab_size_to)\n", - " state_proj = tf.layers.Dense(size_layer)\n", - " init_state = state_proj(tf.layers.flatten(answer))\n", - " \n", - " helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(embedding, shift_right(self.Y)),\n", - " sequence_length = tf.to_int32(self.Y_seq_len))\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(cell = cell,\n", - " helper = helper,\n", - " initial_state = init_state,\n", - " output_layer = vocab_proj)\n", - " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder = decoder,\n", - " maximum_iterations = max_answer_len)\n", - " \n", - " helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding = embedding,\n", - " start_tokens = tf.tile(\n", - " tf.constant([GO], \n", - " dtype=tf.int32), \n", - " [tf.shape(init_state)[0]]),\n", - " end_token = EOS)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = cell,\n", - " helper = helper,\n", - " initial_state = init_state,\n", - " output_layer = vocab_proj)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = decoder,\n", - " maximum_iterations = max_answer_len)\n", - " self.training_logits = decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " self.logits = decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, max_answer_len, dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :87: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" - ] - } - ], - "source": [ - "epoch = 20\n", - "batch_size = 16\n", - "size_layer = 256\n", - "\n", - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = QA(len(dictionary_from), len(dictionary_to), size_layer, 1e-3)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.538514, avg accuracy: 0.885373\n", - "epoch: 2, avg loss: 0.724011, avg accuracy: 0.915764\n", - "epoch: 3, avg loss: 0.650569, avg accuracy: 0.920936\n", - "epoch: 4, avg loss: 0.620413, avg accuracy: 0.923500\n", - "epoch: 5, avg loss: 0.597482, avg accuracy: 0.925209\n", - "epoch: 6, avg loss: 0.575253, avg accuracy: 0.927236\n", - "epoch: 7, avg loss: 0.552628, avg accuracy: 0.929255\n", - "epoch: 8, avg loss: 0.529827, avg accuracy: 0.931227\n", - "epoch: 9, avg loss: 0.507104, avg accuracy: 0.933382\n", - "epoch: 10, avg loss: 0.484589, avg accuracy: 0.935700\n", - "epoch: 11, avg loss: 0.462313, avg accuracy: 0.938055\n", - "epoch: 12, avg loss: 0.440341, avg accuracy: 0.940618\n", - "epoch: 13, avg loss: 0.418683, avg accuracy: 0.943500\n", - "epoch: 14, avg loss: 0.397441, avg accuracy: 0.946464\n", - "epoch: 15, avg loss: 0.376639, avg accuracy: 0.949491\n", - "epoch: 16, avg loss: 0.356326, avg accuracy: 0.952673\n", - "epoch: 17, avg loss: 0.336615, avg accuracy: 0.955545\n", - "epoch: 18, avg loss: 0.317487, avg accuracy: 0.958845\n", - "epoch: 19, avg loss: 0.299116, avg accuracy: 0.961973\n", - "epoch: 20, avg loss: 0.281513, avg accuracy: 0.965700\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_question)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi sẽ hỏi họ bạn có thể làm việc này . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: và tôi sẽ hỏi họ bạn có thể làm việc này . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi sẽ hỏi họ bạn có thể làm việc này . \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi sẽ hỏi họ bạn có thể làm việc này . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/43.residual-lstm-seq2seq-greedy-bahdanau.ipynb b/neural-machine-translation/43.residual-lstm-seq2seq-greedy-bahdanau.ipynb new file mode 100644 index 0000000..a16a3e1 --- /dev/null +++ b/neural-machine-translation/43.residual-lstm-seq2seq-greedy-bahdanau.ipynb @@ -0,0 +1,811 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cell(size, residual, reuse=False):\n", + " c = tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " if residual:\n", + " c = tf.nn.rnn_cell.ResidualWrapper(c)\n", + " return c\n", + " \n", + " def cells(size = size_layer, residual = 1, reuse=False):\n", + " cell_list = []\n", + " for i in range(num_layers):\n", + " cell_list.append(cell(size, i >= residual, reuse=reuse))\n", + " return cell_list\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells(reuse=reuse)), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells()), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :42: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :45: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 550, 16178, 30935, 19914, 7969, 3842, 25245, 21716, 5951,\n", + " 23121, 7843, 19285, 29343, 3512, 12827, 17261, 22834, 16377,\n", + " 7956, 31374, 27619, 27619, 3971, 8356, 22652, 28710, 10778,\n", + " 8320, 29746, 30612, 30069, 27202, 7921, 13627, 10458, 18455,\n", + " 3734, 4038, 13534, 4038, 20626, 9384, 19846, 19846, 20432,\n", + " 13803, 12925, 19693, 18837, 18837, 6733, 21716, 21716, 8643,\n", + " 17845, 2382, 2100, 2100, 21323, 19530, 19530, 10818, 10453,\n", + " 170, 10453, 3311, 21309, 17949, 16680, 31154, 29434, 29285],\n", + " [ 9418, 9418, 9418, 9418, 9418, 9418, 28244, 7893, 20489,\n", + " 20489, 24536, 7244, 8562, 7244, 6563, 6563, 29549, 14523,\n", + " 6563, 8362, 21870, 28339, 2679, 20224, 7760, 16111, 14842,\n", + " 13538, 14097, 14097, 10981, 14097, 14097, 3331, 23235, 13295,\n", + " 10198, 10198, 29897, 8509, 16369, 1093, 14166, 925, 12512,\n", + " 8605, 12512, 28037, 12828, 12600, 20725, 16809, 20185, 20725,\n", + " 13925, 3345, 24411, 24411, 29342, 4104, 13925, 13925, 8083,\n", + " 30678, 7786, 7786, 1910, 1910, 25680, 27728, 1910, 25680],\n", + " [10241, 17084, 31811, 19739, 7089, 7990, 7990, 29000, 29000,\n", + " 263, 263, 15445, 4941, 11156, 3691, 10092, 6314, 29087,\n", + " 10245, 29500, 5202, 18170, 11795, 29409, 30081, 29409, 13454,\n", + " 13454, 16098, 9316, 9316, 27406, 27406, 4293, 12557, 12597,\n", + " 15241, 11061, 11061, 30432, 11061, 30432, 11061, 11061, 30432,\n", + " 16884, 22329, 13382, 15177, 15177, 11061, 11061, 11061, 22238,\n", + " 19785, 19785, 30283, 30283, 15868, 20266, 15868, 835, 12156,\n", + " 16458, 16458, 20014, 17116, 22442, 22245, 27731, 21983, 21983],\n", + " [26261, 22474, 22254, 12511, 14733, 14580, 12349, 1608, 5819,\n", + " 2913, 2913, 2913, 2913, 2913, 2913, 2913, 26172, 2913,\n", + " 2913, 26172, 2913, 2913, 2913, 26172, 2913, 2913, 2913,\n", + " 26172, 2913, 2913, 26172, 2913, 2913, 26172, 2913, 2913,\n", + " 26172, 2913, 2913, 26172, 2913, 2913, 26172, 2913, 2913,\n", + " 26172, 2913, 2913, 26172, 2913, 2913, 26172, 2913, 26172,\n", + " 2913, 2913, 26172, 2913, 2913, 26172, 2913, 26172, 2913,\n", + " 2913, 26172, 2913, 2913, 26172, 2913, 26172, 2913, 2913],\n", + " [28008, 29697, 29697, 17430, 17430, 31573, 14060, 12075, 28255,\n", + " 27331, 27331, 27331, 6300, 6300, 5389, 305, 13520, 3213,\n", + " 27892, 5267, 5267, 8533, 8533, 14667, 26443, 26443, 10931,\n", + " 8113, 19013, 13806, 22255, 12908, 13773, 30347, 30347, 432,\n", + " 12295, 8764, 8114, 8114, 8114, 8114, 1690, 12447, 12447,\n", + " 31180, 12447, 31180, 31273, 31273, 12413, 6515, 6515, 24317,\n", + " 24317, 30300, 25617, 25617, 14465, 15900, 7183, 7183, 20910,\n", + " 21478, 20910, 6631, 26421, 4965, 20910, 20910, 6631, 25326],\n", + " [ 6114, 6114, 6114, 3682, 13692, 13692, 288, 18157, 11648,\n", + " 9007, 9007, 11029, 13579, 5308, 5526, 21512, 5021, 2340,\n", + " 16930, 2340, 2548, 2548, 30638, 25170, 20467, 26673, 26673,\n", + " 7801, 26673, 7801, 2208, 2208, 9996, 29969, 19917, 19917,\n", + " 25179, 25179, 20043, 13915, 20043, 13830, 290, 24096, 18730,\n", + " 24096, 9282, 8589, 28010, 30223, 25996, 25996, 290, 28010,\n", + " 28010, 21852, 4668, 28562, 25996, 21897, 27631, 16704, 31069,\n", + " 29795, 14687, 7718, 15449, 8908, 8908, 13371, 17834, 17834],\n", + " [14672, 14672, 31716, 8949, 27030, 26436, 26436, 26436, 19933,\n", + " 26828, 14098, 21323, 13610, 14371, 27848, 13375, 13375, 30609,\n", + " 28009, 28009, 4570, 25128, 21315, 11023, 31417, 20491, 19578,\n", + " 20349, 18170, 12675, 18535, 26001, 26001, 26001, 26001, 26001,\n", + " 26001, 24361, 30246, 7023, 30246, 26770, 30606, 22524, 22524,\n", + " 22524, 22524, 22524, 17543, 28252, 1541, 28252, 28252, 2026,\n", + " 8312, 3262, 16879, 16879, 5960, 31565, 31565, 25274, 28383,\n", + " 1311, 2331, 4038, 22628, 12019, 1311, 2331, 3150, 3150],\n", + " [ 2225, 13157, 8859, 7927, 8842, 17006, 17006, 17006, 5822,\n", + " 17006, 30187, 27029, 27029, 19818, 2020, 2020, 2230, 19173,\n", + " 25204, 3687, 15523, 20912, 16499, 20221, 6415, 25982, 11286,\n", + " 11286, 4313, 1224, 19456, 16668, 16049, 15575, 19038, 29184,\n", + " 29184, 16049, 21914, 4991, 3861, 16141, 24213, 24213, 19818,\n", + " 6563, 20913, 2155, 14080, 7918, 17947, 7918, 12807, 23865,\n", + " 30732, 4495, 30654, 17511, 11286, 5906, 24755, 26169, 3355,\n", + " 25259, 18085, 27168, 18085, 16631, 1076, 1076, 7329, 30984],\n", + " [ 8394, 13038, 24975, 13396, 15602, 5994, 6394, 11879, 548,\n", + " 6719, 17534, 17534, 14534, 14534, 5464, 22449, 5467, 74,\n", + " 5467, 22060, 9590, 7397, 27087, 9214, 10826, 5454, 16578,\n", + " 5053, 16555, 31254, 10012, 31254, 1328, 19360, 22533, 12998,\n", + " 18989, 21810, 27390, 18989, 16342, 16342, 4983, 8947, 23785,\n", + " 11101, 14988, 11598, 11598, 3198, 14000, 3198, 25237, 27450,\n", + " 31116, 16168, 27699, 27699, 18626, 31410, 29678, 27696, 29003,\n", + " 11093, 11093, 13026, 21175, 21886, 20771, 22071, 30598, 30598],\n", + " [14579, 14579, 3791, 3791, 8491, 26347, 31673, 9242, 26296,\n", + " 2484, 23307, 27739, 4488, 5750, 21828, 5845, 9316, 9316,\n", + " 9316, 31997, 15983, 23246, 5837, 13298, 11599, 25429, 25429,\n", + " 25880, 20183, 11251, 11251, 19098, 5245, 7399, 21000, 10481,\n", + " 2737, 2737, 2088, 2088, 24504, 26736, 26736, 26736, 12752,\n", + " 2723, 30066, 16284, 13025, 27377, 26213, 12407, 18854, 23246,\n", + " 20058, 6974, 14883, 17249, 17249, 13989, 17328, 16157, 8591,\n", + " 12939, 14479, 12939, 29829, 12939, 29829, 29829, 17564, 8399]],\n", + " dtype=int32), 10.373791, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:15<00:00, 2.31it/s, accuracy=0.417, cost=3.42]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.87it/s, accuracy=0.478, cost=3.03]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/44.memory-network-lstm.ipynb b/neural-machine-translation/44.memory-network-lstm.ipynb deleted file mode 100644 index 2781a5d..0000000 --- a/neural-machine-translation/44.memory-network-lstm.ipynb +++ /dev/null @@ -1,447 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def hop_forward(memory_o, memory_i, response_proj, inputs_len, questions_len):\n", - " match = memory_i\n", - " match = pre_softmax_masking(match, inputs_len)\n", - " match = tf.nn.softmax(match)\n", - " match = post_softmax_masking(match, questions_len)\n", - " response = tf.multiply(match, memory_o)\n", - " return response_proj(response)\n", - "\n", - "\n", - "def pre_softmax_masking(x, seq_len):\n", - " paddings = tf.fill(tf.shape(x), float('-inf'))\n", - " T = tf.shape(x)[1]\n", - " max_seq_len = tf.shape(x)[2]\n", - " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", - " masks = tf.tile(tf.expand_dims(masks, 1), [1, T, 1])\n", - " return tf.where(tf.equal(masks, 0), paddings, x)\n", - "\n", - "\n", - "def post_softmax_masking(x, seq_len):\n", - " T = tf.shape(x)[2]\n", - " max_seq_len = tf.shape(x)[1]\n", - " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", - " masks = tf.tile(tf.expand_dims(masks, -1), [1, 1, T])\n", - " return x * masks\n", - "\n", - "\n", - "def shift_right(x):\n", - " batch_size = tf.shape(x)[0]\n", - " start = tf.to_int32(tf.fill([batch_size, 1], GO))\n", - " return tf.concat([start, x[:, :-1]], 1)\n", - "\n", - "\n", - "def embed_seq(x, vocab_size, zero_pad = True):\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table', [vocab_size, size_layer], tf.float32\n", - " )\n", - " if zero_pad:\n", - " lookup_table = tf.concat(\n", - " (tf.zeros([1, size_layer]), lookup_table[1:, :]), axis = 0\n", - " )\n", - " return tf.nn.embedding_lookup(lookup_table, x)\n", - "\n", - "\n", - "def position_encoding(sentence_size, embedding_size):\n", - " encoding = np.ones((embedding_size, sentence_size), dtype = np.float32)\n", - " ls = sentence_size + 1\n", - " le = embedding_size + 1\n", - " for i in range(1, le):\n", - " for j in range(1, ls):\n", - " encoding[i - 1, j - 1] = (i - (le - 1) / 2) * (j - (ls - 1) / 2)\n", - " encoding = 1 + 4 * encoding / embedding_size / sentence_size\n", - " return np.transpose(encoding)\n", - "\n", - "def quest_mem(x, vocab_size, max_quest_len):\n", - " x = embed_seq(x, vocab_size)\n", - " pos = position_encoding(max_quest_len, size_layer)\n", - " return x * pos\n", - "\n", - "class QA:\n", - " def __init__(self, vocab_size_from, vocab_size_to, size_layer, learning_rate, n_hops = 3):\n", - " self.X = tf.placeholder(tf.int32,[None,None])\n", - " self.Y = tf.placeholder(tf.int32,[None,None])\n", - " self.X_seq_len = tf.fill([tf.shape(self.X)[0]],maxlen_question)\n", - " self.Y_seq_len = tf.fill([tf.shape(self.X)[0]],maxlen_answer)\n", - " max_quest_len = maxlen_question\n", - " max_answer_len = maxlen_answer\n", - " \n", - " lookup_table = tf.get_variable('lookup_table', [vocab_size_from, size_layer], tf.float32)\n", - " \n", - " with tf.variable_scope('memory_o'):\n", - " memory_o = quest_mem(self.X, vocab_size_from, max_quest_len)\n", - " \n", - " with tf.variable_scope('memory_i'):\n", - " memory_i = quest_mem(self.X, vocab_size_from, max_quest_len)\n", - " \n", - " with tf.variable_scope('interaction'):\n", - " response_proj = tf.layers.Dense(size_layer)\n", - " for _ in range(n_hops):\n", - " answer = hop_forward(memory_o,\n", - " memory_i,\n", - " response_proj,\n", - " self.X_seq_len,\n", - " self.X_seq_len)\n", - " memory_i = answer\n", - " \n", - " embedding = tf.Variable(tf.random_uniform([vocab_size_to, size_layer], -1, 1))\n", - " cell = tf.nn.rnn_cell.LSTMCell(size_layer)\n", - " vocab_proj = tf.layers.Dense(vocab_size_to)\n", - " state_proj = tf.layers.Dense(size_layer)\n", - " init_state = state_proj(tf.layers.flatten(answer))\n", - " \n", - " helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(embedding, shift_right(self.Y)),\n", - " sequence_length = tf.to_int32(self.Y_seq_len))\n", - " encoder_state = tf.nn.rnn_cell.LSTMStateTuple(c=init_state, h=init_state)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(cell = cell,\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = vocab_proj)\n", - " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder = decoder,\n", - " maximum_iterations = max_answer_len)\n", - " \n", - " helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding = embedding,\n", - " start_tokens = tf.tile(\n", - " tf.constant([GO], \n", - " dtype=tf.int32), \n", - " [tf.shape(init_state)[0]]),\n", - " end_token = EOS)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = cell,\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = vocab_proj)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = decoder,\n", - " maximum_iterations = max_answer_len)\n", - " self.training_logits = decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " self.logits = decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, max_answer_len, dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "epoch = 20\n", - "batch_size = 16\n", - "size_layer = 256\n", - "\n", - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = QA(len(dictionary_from), len(dictionary_to), size_layer, 1e-3)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.657692, avg accuracy: 0.883773\n", - "epoch: 2, avg loss: 0.694845, avg accuracy: 0.914418\n", - "epoch: 3, avg loss: 0.670171, avg accuracy: 0.919145\n", - "epoch: 4, avg loss: 0.655215, avg accuracy: 0.920982\n", - "epoch: 5, avg loss: 0.644285, avg accuracy: 0.922491\n", - "epoch: 6, avg loss: 0.634069, avg accuracy: 0.923673\n", - "epoch: 7, avg loss: 0.623932, avg accuracy: 0.923918\n", - "epoch: 8, avg loss: 0.613115, avg accuracy: 0.924627\n", - "epoch: 9, avg loss: 0.600960, avg accuracy: 0.925236\n", - "epoch: 10, avg loss: 0.587711, avg accuracy: 0.925800\n", - "epoch: 11, avg loss: 0.573947, avg accuracy: 0.926691\n", - "epoch: 12, avg loss: 0.558712, avg accuracy: 0.928582\n", - "epoch: 13, avg loss: 0.542557, avg accuracy: 0.930164\n", - "epoch: 14, avg loss: 0.525535, avg accuracy: 0.932000\n", - "epoch: 15, avg loss: 0.509237, avg accuracy: 0.933582\n", - "epoch: 16, avg loss: 0.493271, avg accuracy: 0.935100\n", - "epoch: 17, avg loss: 0.475748, avg accuracy: 0.937173\n", - "epoch: 18, avg loss: 0.457230, avg accuracy: 0.938718\n", - "epoch: 19, avg loss: 0.439197, avg accuracy: 0.940700\n", - "epoch: 20, avg loss: 0.420498, avg accuracy: 0.942591\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_question)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi đã làm ra một chiếc của tôi đã cho các phép và bạn có thể làm cho những gì bạn có thể làm được . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: và tôi đã được làm việc với tôi . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi không phải là một phòng thí nghiệm này . \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi đã làm ra một chiếc của tôi đã nói , nhưng một nhà khoa học và tôi đã làm cho một cách mà chúng tôi đã làm việc với những người như thế này , và những gì không phải là một cách , chúng tôi có thể làm cho một cách của chúng tôi , và chúng tôi đã là một cách , chúng tôi đã làm việc với những người như thế này , và những người có thể tưởng tượng , và chúng tôi đã làm việc với những người , và chúng tôi đã nói , chúng tôi đã phải là một cách , chúng tôi đã làm việc với những người như bạn có thể không thể làm được . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/44.residual-gru-seq2seq-greedy-bahdanau.ipynb b/neural-machine-translation/44.residual-gru-seq2seq-greedy-bahdanau.ipynb new file mode 100644 index 0000000..4790f2e --- /dev/null +++ b/neural-machine-translation/44.residual-gru-seq2seq-greedy-bahdanau.ipynb @@ -0,0 +1,789 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5):\n", + " \n", + " def cell(size, residual, reuse=False):\n", + " c = tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", + " if residual:\n", + " c = tf.nn.rnn_cell.ResidualWrapper(c)\n", + " return c\n", + " \n", + " def cells(size = size_layer, residual = 1, reuse=False):\n", + " cell_list = []\n", + " for i in range(num_layers):\n", + " cell_list.append(cell(size, i >= residual, reuse=reuse))\n", + " return cell_list\n", + " \n", + " def attention(encoder_out, seq_len, reuse=False):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layer, \n", + " memory = encoder_out,\n", + " memory_sequence_length = seq_len)\n", + " return tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells(reuse=reuse)), \n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = size_layer)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " encoder_out, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell(cells()), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = attention(encoder_out, self.X_seq_len)\n", + " \n", + " states = decoder_cells.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = states,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :12: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :42: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :45: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[31388, 28150, 16180, 6919, 6919, 6866, 6573, 6573, 6573,\n", + " 6874, 3797, 29300, 29300, 29300, 10576, 19326, 19326, 11659,\n", + " 11659, 11659, 11659, 2286, 11659, 11659, 2286, 31079, 18128,\n", + " 3603, 3603, 11604, 3016, 13601, 13601, 5148, 5148, 11604,\n", + " 5148, 5148, 11604, 5148, 5148, 5148, 5148, 5148, 30807,\n", + " 19009, 19009, 7310, 20684, 12319, 12319, 11553, 11553, 11553,\n", + " 11553, 29893, 28315, 11604, 19277, 6431, 6431, 4472, 4472,\n", + " 11945, 11945, 1248, 16250, 16250, 16250, 16250, 16250, 16250],\n", + " [26215, 20221, 31003, 31003, 31003, 31003, 25104, 25104, 10623,\n", + " 28578, 12628, 9565, 1619, 9565, 9565, 9565, 28164, 30223,\n", + " 4401, 30223, 28611, 28611, 27379, 27379, 27379, 27379, 23346,\n", + " 23346, 1179, 23346, 16376, 19601, 18640, 31787, 31787, 8235,\n", + " 31787, 8235, 31787, 11192, 7572, 7572, 7572, 7572, 7572,\n", + " 7572, 7572, 7572, 7572, 7572, 7572, 6822, 18618, 24136,\n", + " 24136, 24136, 24136, 1244, 24136, 1244, 3970, 9457, 16757,\n", + " 16757, 23463, 23463, 23463, 24136, 24136, 24136, 16757, 16757],\n", + " [18723, 16359, 9211, 4150, 18669, 4080, 8011, 7420, 23970,\n", + " 30662, 6350, 10663, 7983, 7983, 26171, 4135, 16555, 20001,\n", + " 30491, 30491, 30491, 30491, 30491, 18101, 31447, 11307, 11307,\n", + " 3970, 13068, 13068, 13068, 13068, 3008, 3008, 24522, 24522,\n", + " 24522, 17974, 211, 7929, 211, 31338, 31338, 31338, 349,\n", + " 2144, 23597, 23597, 19987, 18434, 18669, 27313, 23597, 19987,\n", + " 9153, 9153, 9153, 23102, 27543, 17771, 29149, 25286, 9590,\n", + " 12220, 30916, 30916, 22569, 22569, 25209, 22569, 5066, 1343],\n", + " [26541, 6226, 12097, 12097, 23367, 8503, 27767, 22069, 21095,\n", + " 21095, 21084, 20737, 28095, 28095, 716, 8003, 716, 8003,\n", + " 6720, 28751, 8003, 22354, 22354, 13665, 8003, 22354, 13665,\n", + " 8003, 22354, 13665, 19341, 19341, 19341, 19341, 29149, 30645,\n", + " 28223, 28223, 11023, 22114, 22114, 21128, 26608, 6998, 6998,\n", + " 6998, 6998, 30027, 30027, 30027, 30027, 28010, 30582, 30582,\n", + " 30582, 30582, 12050, 26099, 29223, 28546, 28546, 28546, 26099,\n", + " 4003, 26099, 4003, 31376, 1059, 12519, 15929, 8503, 8503],\n", + " [15161, 13351, 22778, 17879, 13351, 31868, 31868, 31868, 31868,\n", + " 31868, 7712, 4672, 4672, 4672, 4672, 4672, 4672, 7572,\n", + " 7572, 7572, 7572, 7572, 7572, 10092, 10092, 19790, 19390,\n", + " 17189, 18431, 23658, 14879, 23658, 3698, 14879, 3698, 14879,\n", + " 3698, 24628, 9274, 22876, 23617, 23617, 23617, 22210, 27410,\n", + " 27410, 12271, 27410, 12271, 27410, 12271, 5891, 27490, 5452,\n", + " 5452, 5452, 6381, 5452, 6381, 5452, 30018, 30018, 23649,\n", + " 23649, 23649, 2584, 2584, 2584, 2584, 2584, 2584, 10234],\n", + " [25184, 24488, 931, 931, 15266, 1630, 9623, 9623, 27718,\n", + " 27718, 27718, 10151, 10151, 1315, 18776, 14204, 3028, 16338,\n", + " 27421, 15412, 16617, 530, 530, 530, 18215, 18215, 20011,\n", + " 10244, 3052, 3052, 3052, 3052, 27003, 3052, 27003, 3052,\n", + " 27003, 3052, 27003, 27003, 31724, 29984, 3465, 29984, 29984,\n", + " 3465, 10578, 10578, 17837, 17837, 17837, 17837, 17837, 6401,\n", + " 30543, 30543, 30543, 30543, 1882, 27032, 27032, 13316, 27032,\n", + " 13316, 30910, 27032, 30179, 30179, 28587, 2640, 6385, 6385],\n", + " [10368, 15895, 15895, 15895, 15895, 15255, 25083, 25083, 25083,\n", + " 8501, 8501, 8501, 3315, 11179, 11179, 22950, 11179, 22950,\n", + " 11179, 22950, 11179, 11179, 9686, 3988, 3988, 3988, 3988,\n", + " 4979, 4979, 9736, 4979, 22377, 22377, 22377, 22377, 22377,\n", + " 22377, 22377, 22377, 22377, 25180, 25180, 14586, 14586, 14586,\n", + " 17399, 25180, 27590, 17399, 11807, 11807, 11807, 11807, 11807,\n", + " 11807, 11807, 11807, 30169, 11807, 23702, 23702, 30169, 30169,\n", + " 30169, 30169, 19384, 19384, 7597, 7597, 7597, 7597, 28696],\n", + " [11178, 2126, 852, 20236, 17525, 9149, 9149, 18438, 10058,\n", + " 10058, 10058, 18438, 10058, 1089, 25468, 23640, 14729, 14729,\n", + " 25468, 14826, 27081, 16864, 16864, 13690, 13690, 11723, 13690,\n", + " 10700, 13690, 10700, 3394, 17525, 17525, 17525, 27297, 17456,\n", + " 10528, 10528, 25083, 25083, 25083, 25083, 22553, 25083, 12291,\n", + " 12291, 12291, 22694, 4017, 18744, 27530, 27530, 15042, 15042,\n", + " 15042, 20996, 20996, 9749, 9749, 9492, 16067, 29667, 29667,\n", + " 29667, 5022, 28662, 3234, 11835, 11835, 11835, 11835, 11835],\n", + " [ 2964, 11628, 25983, 25983, 7179, 31699, 28850, 28850, 30261,\n", + " 23299, 23299, 7866, 7866, 19942, 7866, 28898, 10425, 10425,\n", + " 11622, 24974, 17892, 6537, 6537, 6537, 6537, 30521, 6537,\n", + " 30595, 14775, 14775, 14775, 14775, 14775, 14775, 14775, 14775,\n", + " 30358, 20547, 25106, 25106, 25106, 25106, 4035, 1973, 1973,\n", + " 4035, 19748, 19748, 19748, 19748, 27381, 4268, 30399, 27381,\n", + " 2158, 23847, 2158, 23847, 23847, 11411, 23847, 24335, 24335,\n", + " 9674, 10, 19077, 1888, 1888, 1888, 27861, 26380, 30595],\n", + " [15440, 15440, 15440, 1120, 30332, 17304, 17304, 17304, 17304,\n", + " 15861, 15736, 30194, 29671, 29671, 1551, 22475, 29671, 29671,\n", + " 25726, 25726, 11818, 5322, 11818, 25326, 31583, 5322, 25326,\n", + " 31583, 9514, 9514, 9217, 9217, 25326, 9227, 9227, 9227,\n", + " 9227, 5345, 9227, 5345, 9227, 9227, 9227, 9227, 7156,\n", + " 7156, 7156, 26868, 26868, 26868, 23870, 23870, 26560, 26560,\n", + " 26560, 26560, 26560, 2615, 13758, 13758, 11716, 11716, 9702,\n", + " 9702, 6636, 6636, 6636, 16759, 159, 16402, 31991, 31991]],\n", + " dtype=int32), 10.376651, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [11:15<00:00, 2.31it/s, accuracy=0.415, cost=3.48]\n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.87it/s, accuracy=0.462, cost=2.86]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)\n", + " \n", + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])\n", + " \n", + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/45.attention-is-all-you-need.ipynb b/neural-machine-translation/45.attention-is-all-you-need.ipynb deleted file mode 100644 index f05f223..0000000 --- a/neural-machine-translation/45.attention-is-all-you-need.ipynb +++ /dev/null @@ -1,516 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - "\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " \n", - " outputs = gamma * normalized + beta\n", - " return outputs\n", - "\n", - "\n", - "def multihead_attn(queries, keys, q_masks, k_masks, future_binding, num_units, num_heads):\n", - " \n", - " T_q = tf.shape(queries)[1] \n", - " T_k = tf.shape(keys)[1] \n", - "\n", - " Q = tf.layers.dense(queries, num_units, name='Q') \n", - " K_V = tf.layers.dense(keys, 2*num_units, name='K_V') \n", - " K, V = tf.split(K_V, 2, -1) \n", - "\n", - " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) \n", - " K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) \n", - " V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) \n", - "\n", - " align = tf.matmul(Q_, tf.transpose(K_, [0,2,1])) \n", - " align = align / np.sqrt(K_.get_shape().as_list()[-1]) \n", - "\n", - " paddings = tf.fill(tf.shape(align), float('-inf')) \n", - "\n", - " key_masks = k_masks \n", - " key_masks = tf.tile(key_masks, [num_heads, 1]) \n", - " key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, T_q, 1]) \n", - " align = tf.where(tf.equal(key_masks, 0), paddings, align) \n", - "\n", - " if future_binding:\n", - " lower_tri = tf.ones([T_q, T_k]) \n", - " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense() \n", - " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0], 1, 1]) \n", - " align = tf.where(tf.equal(masks, 0), paddings, align) \n", - " \n", - " align = tf.nn.softmax(align) \n", - " query_masks = tf.to_float(q_masks) \n", - " query_masks = tf.tile(query_masks, [num_heads, 1]) \n", - " query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, T_k]) \n", - " align *= query_masks \n", - " \n", - " outputs = tf.matmul(align, V_) \n", - " outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) \n", - " outputs += queries \n", - " outputs = layer_norm(outputs) \n", - " return outputs\n", - "\n", - "\n", - "def pointwise_feedforward(inputs, hidden_units, activation=None):\n", - " outputs = tf.layers.dense(inputs, 4*hidden_units, activation=activation)\n", - " outputs = tf.layers.dense(outputs, hidden_units, activation=None)\n", - " outputs += inputs\n", - " outputs = layer_norm(outputs)\n", - " return outputs\n", - "\n", - "\n", - "def learned_position_encoding(inputs, mask, embed_dim):\n", - " T = tf.shape(inputs)[1]\n", - " outputs = tf.range(tf.shape(inputs)[1]) # (T_q)\n", - " outputs = tf.expand_dims(outputs, 0) # (1, T_q)\n", - " outputs = tf.tile(outputs, [tf.shape(inputs)[0], 1]) # (N, T_q)\n", - " outputs = embed_seq(outputs, T, embed_dim, zero_pad=False, scale=False)\n", - " return tf.expand_dims(tf.to_float(mask), -1) * outputs\n", - "\n", - "\n", - "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n", - " T = tf.shape(inputs)[1]\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n", - "\n", - "\n", - "def label_smoothing(inputs, epsilon=0.1):\n", - " C = inputs.get_shape().as_list()[-1]\n", - " return ((1 - epsilon) * inputs) + (epsilon / C)\n", - "\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, embedded_size, from_dict_size, to_dict_size, learning_rate,\n", - " num_blocks = 2,\n", - " num_heads = 8,\n", - " min_freq = 50):\n", - " self.X = tf.placeholder(tf.int32,[None,None])\n", - " self.Y = tf.placeholder(tf.int32,[None,None])\n", - " \n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " \n", - " def forward(x, y):\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, x)\n", - " en_masks = tf.sign(x)\n", - " encoder_embedded += sinusoidal_position_encoding(x, en_masks, embedded_size)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('encoder_self_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " encoder_embedded = multihead_attn(queries = encoder_embedded,\n", - " keys = encoder_embedded,\n", - " q_masks = en_masks,\n", - " k_masks = en_masks,\n", - " future_binding = False,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - "\n", - " with tf.variable_scope('encoder_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " encoder_embedded = pointwise_feedforward(encoder_embedded,\n", - " embedded_size,\n", - " activation = tf.nn.relu)\n", - " \n", - " decoder_embedded = tf.nn.embedding_lookup(decoder_embedding, y)\n", - " de_masks = tf.sign(y)\n", - " decoder_embedded += sinusoidal_position_encoding(y, de_masks, embedded_size)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('decoder_self_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = decoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = True,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('decoder_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = encoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = en_masks,\n", - " future_binding = False,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('decoder_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = pointwise_feedforward(decoder_embedded,\n", - " embedded_size,\n", - " activation = tf.nn.relu)\n", - " \n", - " return tf.layers.dense(decoder_embedded, to_dict_size, reuse=tf.AUTO_REUSE)\n", - " \n", - " self.training_logits = forward(self.X, decoder_input)\n", - " \n", - " def cond(i, y, temp):\n", - " return i < 2 * tf.reduce_max(self.X_seq_len)\n", - " \n", - " def body(i, y, temp):\n", - " logits = forward(self.X, y)\n", - " ids = tf.argmax(logits, -1)[:, i]\n", - " ids = tf.expand_dims(ids, -1)\n", - " temp = tf.concat([temp[:, 1:], ids], -1)\n", - " y = tf.concat([temp[:, -(i+1):], temp[:, :-(i+1)]], -1)\n", - " y = tf.reshape(y, [tf.shape(temp)[0], 2 * tf.reduce_max(self.X_seq_len)])\n", - " i += 1\n", - " return i, y, temp\n", - " \n", - " target = tf.fill([batch_size, 2 * tf.reduce_max(self.X_seq_len)], GO)\n", - " target = tf.cast(target, tf.int64)\n", - " self.target = target\n", - " \n", - " _, self.predicting_ids, _ = tf.while_loop(cond, body, \n", - " [tf.constant(0), target, target])\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "embedded_size = 256\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(embedded_size, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.446041, avg accuracy: 0.063624\n", - "epoch: 2, avg loss: 6.000207, avg accuracy: 0.087263\n", - "epoch: 3, avg loss: 5.758100, avg accuracy: 0.105811\n", - "epoch: 4, avg loss: 5.600954, avg accuracy: 0.123585\n", - "epoch: 5, avg loss: 5.564998, avg accuracy: 0.113341\n", - "epoch: 6, avg loss: 5.493285, avg accuracy: 0.120724\n", - "epoch: 7, avg loss: 5.529355, avg accuracy: 0.117925\n", - "epoch: 8, avg loss: 5.522790, avg accuracy: 0.119511\n", - "epoch: 9, avg loss: 5.488758, avg accuracy: 0.124838\n", - "epoch: 10, avg loss: 5.481946, avg accuracy: 0.124881\n", - "epoch: 11, avg loss: 5.495231, avg accuracy: 0.115534\n", - "epoch: 12, avg loss: 5.358830, avg accuracy: 0.120977\n", - "epoch: 13, avg loss: 5.233122, avg accuracy: 0.136445\n", - "epoch: 14, avg loss: 5.153393, avg accuracy: 0.143617\n", - "epoch: 15, avg loss: 5.057733, avg accuracy: 0.152985\n", - "epoch: 16, avg loss: 4.906600, avg accuracy: 0.162316\n", - "epoch: 17, avg loss: 4.826649, avg accuracy: 0.164304\n", - "epoch: 18, avg loss: 4.789770, avg accuracy: 0.162977\n", - "epoch: 19, avg loss: 4.777955, avg accuracy: 0.166525\n", - "epoch: 20, avg loss: 4.712728, avg accuracy: 0.170279\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và đây là một một đây một một một lớp lớp một một một một một một một những những cách cách những những những những lớp cách cách cách cách cách cách cách cách cách những những một một một những cách cách những những những những những cách \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: và đây là một một đây một một một lớp lớp một một một một một một một những những cách cách những những những những lớp cách cách cách cách cách cách cách cách cách những những một một một những cách cách những những những những những cách \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và đây là một một đây một một một lớp lớp một một một một một một một những những cách cách những những những những lớp cách cách cách cách cách cách cách cách cách những những một một một những cách cách những những những những những cách \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và đây là một một đây một một một lớp lớp một một một một một một một những những cách cách những những những những lớp cách cách cách cách cách cách cách cách cách những những một một một những cách cách những những những những những cách \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/45.memory-network-lstm-decoder-greedy.ipynb b/neural-machine-translation/45.memory-network-lstm-decoder-greedy.ipynb new file mode 100644 index 0000000..ed31a17 --- /dev/null +++ b/neural-machine-translation/45.memory-network-lstm-decoder-greedy.ipynb @@ -0,0 +1,839 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "def hop_forward(memory_o, memory_i, response_proj, inputs_len, questions_len):\n", + " match = memory_i\n", + " match = pre_softmax_masking(match, inputs_len)\n", + " match = tf.nn.softmax(match)\n", + " match = post_softmax_masking(match, questions_len)\n", + " response = tf.multiply(match, memory_o)\n", + " return response_proj(response)\n", + "\n", + "\n", + "def pre_softmax_masking(x, seq_len):\n", + " paddings = tf.fill(tf.shape(x), float('-inf'))\n", + " T = tf.shape(x)[1]\n", + " max_seq_len = tf.shape(x)[2]\n", + " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", + " masks = tf.tile(tf.expand_dims(masks, 1), [1, T, 1])\n", + " return tf.where(tf.equal(masks, 0), paddings, x)\n", + "\n", + "\n", + "def post_softmax_masking(x, seq_len):\n", + " T = tf.shape(x)[2]\n", + " max_seq_len = tf.shape(x)[1]\n", + " masks = tf.sequence_mask(seq_len, max_seq_len, dtype = tf.float32)\n", + " masks = tf.tile(tf.expand_dims(masks, -1), [1, 1, T])\n", + " return x * masks\n", + "\n", + "def embed_seq(x, vocab_size, zero_pad = True):\n", + " lookup_table = tf.get_variable(\n", + " 'lookup_table', [vocab_size, size_layer], tf.float32\n", + " )\n", + " if zero_pad:\n", + " lookup_table = tf.concat(\n", + " (tf.zeros([1, size_layer]), lookup_table[1:, :]), axis = 0\n", + " )\n", + " return tf.nn.embedding_lookup(lookup_table, x)\n", + "\n", + "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n", + " T = tf.shape(inputs)[1]\n", + " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", + " i = np.arange(0, repr_dim, 2, np.float32)\n", + " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", + " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", + " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n", + "\n", + "def quest_mem(x, vocab_size, size_layer):\n", + " en_masks = tf.sign(x)\n", + " x = embed_seq(x, vocab_size)\n", + " x += sinusoidal_position_encoding(x, en_masks, size_layer)\n", + " return x\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate,\n", + " beam_width = 5, n_hops = 3):\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " lookup_table = tf.get_variable('lookup_table', [vocab_size, size_layer], tf.float32)\n", + " \n", + " with tf.variable_scope('memory_o'):\n", + " memory_o = quest_mem(self.X, vocab_size, size_layer)\n", + " \n", + " with tf.variable_scope('memory_i'):\n", + " memory_i = quest_mem(self.X, vocab_size, size_layer)\n", + " \n", + " with tf.variable_scope('interaction'):\n", + " response_proj = tf.layers.Dense(size_layer)\n", + " for _ in range(n_hops):\n", + " answer = hop_forward(memory_o,\n", + " memory_i,\n", + " response_proj,\n", + " self.X_seq_len,\n", + " self.X_seq_len)\n", + " memory_i = answer\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + "\n", + " init_state = answer[:,-1]\n", + " encoder_state = tf.nn.rnn_cell.LSTMStateTuple(c=init_state, h=init_state)\n", + " encoder_state = tuple([encoder_state] * num_layers)\n", + " \n", + " print(encoder_state)\n", + " vocab_proj = tf.layers.Dense(vocab_size)\n", + " \n", + " helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(lookup_table, decoder_input),\n", + " sequence_length = tf.to_int32(self.Y_seq_len))\n", + " \n", + " decoder = tf.contrib.seq2seq.BasicDecoder(cell = decoder_cells,\n", + " helper = helper,\n", + " initial_state = encoder_state,\n", + " output_layer = vocab_proj)\n", + " \n", + " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder = decoder,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " \n", + " helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding = lookup_table,\n", + " start_tokens = tf.tile(\n", + " tf.constant([GO], \n", + " dtype=tf.int32), \n", + " [tf.shape(init_state)[0]]),\n", + " end_token = EOS)\n", + " decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = helper,\n", + " initial_state = encoder_state,\n", + " output_layer = vocab_proj)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = decoder,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.training_logits = decoder_output.rnn_output\n", + " self.logits = decoder_output.sample_id\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=))\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [ 2823, 21088, 14140, 14140, 26778, 1178, 1178, 28964, 28964,\n", + " 28964, 11199, 11199, 24198, 10398, 27551, 10398, 18156, 18529,\n", + " 8502, 8502, 18146, 2039, 21595, 21595, 2379, 2379, 2379,\n", + " 9760, 9760, 9760, 24497, 17615, 17615, 4110, 8817, 8817,\n", + " 24513, 24513, 10747, 27780, 27780, 25660, 25660, 25660, 27313,\n", + " 27313, 7231, 23454, 9132, 9132, 9132, 25565, 13134, 16748,\n", + " 16748, 27633, 27633, 27633, 20065, 27633, 3558, 3558, 13439,\n", + " 13439, 13439, 29486, 12549, 28570, 28570, 11940, 11940, 10776],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678],\n", + " [19397, 19397, 27768, 27768, 17696, 17696, 27768, 25642, 25642,\n", + " 27768, 28355, 28503, 14535, 17902, 17902, 14535, 17254, 24473,\n", + " 24473, 7048, 15065, 17168, 17168, 21212, 21212, 22429, 22429,\n", + " 22429, 25737, 3915, 3915, 11557, 11557, 23311, 10254, 6953,\n", + " 6953, 10254, 3712, 3712, 30643, 22712, 22712, 22712, 4579,\n", + " 4579, 31011, 31011, 31011, 31011, 6226, 822, 23311, 25129,\n", + " 25129, 20665, 9644, 9644, 31653, 31653, 31653, 31653, 21142,\n", + " 21142, 21142, 23120, 23095, 26751, 13780, 13780, 13780, 16678]],\n", + " dtype=int32), 10.3734865, 0.0]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [06:46<00:00, 3.85it/s, accuracy=0.136, cost=5.99]\n", + "minibatch loop: 100%|██████████| 40/40 [00:05<00:00, 7.31it/s, accuracy=0.151, cost=5.5] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/46.google-nmt.ipynb b/neural-machine-translation/46.google-nmt.ipynb new file mode 100644 index 0000000..8424afd --- /dev/null +++ b/neural-machine-translation/46.google-nmt.ipynb @@ -0,0 +1,889 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.python.util import nest\n", + "from tensorflow.python.layers.core import Dense\n", + "\n", + "def gnmt_residual_fn(inputs, outputs):\n", + " def split_input(inp, out):\n", + " out_dim = out.get_shape().as_list()[-1]\n", + " inp_dim = inp.get_shape().as_list()[-1]\n", + " return tf.split(inp, [out_dim, inp_dim - out_dim], axis=-1)\n", + " actual_inputs, _ = nest.map_structure(split_input, inputs, outputs)\n", + "\n", + " def assert_shape_match(inp, out):\n", + " inp.get_shape().assert_is_compatible_with(out.get_shape())\n", + " nest.assert_same_structure(actual_inputs, outputs)\n", + " nest.map_structure(assert_shape_match, actual_inputs, outputs)\n", + " return nest.map_structure(lambda inp, out: inp + out, actual_inputs, outputs)\n", + "\n", + "class GNMTAttentionMultiCell(tf.nn.rnn_cell.MultiRNNCell):\n", + "\n", + " def __init__(self, attention_cell, cells, use_new_attention=True):\n", + " cells = [attention_cell] + cells\n", + " self.use_new_attention = use_new_attention\n", + " super(GNMTAttentionMultiCell, self).__init__(\n", + " cells, state_is_tuple=True)\n", + "\n", + " def __call__(self, inputs, state, scope=None):\n", + " \"\"\"Run the cell with bottom layer's attention copied to all upper layers.\"\"\"\n", + " if not nest.is_sequence(state):\n", + " raise ValueError(\n", + " \"Expected state to be a tuple of length %d, but received: %s\"\n", + " % (len(self.state_size), state))\n", + "\n", + " with tf.variable_scope(scope or \"multi_rnn_cell\"):\n", + " new_states = []\n", + "\n", + " with tf.variable_scope(\"cell_0_attention\"):\n", + " attention_cell = self._cells[0]\n", + " attention_state = state[0]\n", + " cur_inp, new_attention_state = attention_cell(\n", + " inputs, attention_state)\n", + " new_states.append(new_attention_state)\n", + "\n", + " for i in range(1, len(self._cells)):\n", + " with tf.variable_scope(\"cell_%d\" % i):\n", + " cell = self._cells[i]\n", + " cur_state = state[i]\n", + "\n", + " if self.use_new_attention:\n", + " cur_inp = tf.concat(\n", + " [cur_inp, new_attention_state.attention], -1)\n", + " else:\n", + " cur_inp = tf.concat(\n", + " [cur_inp, attention_state.attention], -1)\n", + "\n", + " cur_inp, new_state = cell(cur_inp, cur_state)\n", + " new_states.append(new_state)\n", + " return cur_inp, tuple(new_states)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate, beam_width = 5):\n", + " \n", + " def cells(size_layer=size_layer,reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, decoder_input)\n", + " \n", + " num_residual_layer = num_layers - 2\n", + " num_bi_layer = 1\n", + " num_ui_layer = num_layers - num_bi_layer\n", + " \n", + " encoder_outputs, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " \n", + " decoder_cells = []\n", + " for n in range(num_layers):\n", + " cell = cells(size_layer)\n", + " if (n >= num_layers - num_residual_layer):\n", + " cell = tf.nn.rnn_cell.ResidualWrapper(cell, residual_fn = gnmt_residual_fn)\n", + " decoder_cells.append(cell)\n", + " attention_cell = decoder_cells.pop(0)\n", + " to_dense = tf.layers.Dense(vocab_size)\n", + " \n", + " with tf.variable_scope('decode'):\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", + " num_units = size_layer, \n", + " memory = encoder_outputs,\n", + " memory_sequence_length = self.X_seq_len)\n", + " att_cell = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = attention_cell,\n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = None,\n", + " alignment_history = True,\n", + " output_attention = False)\n", + " gcell = GNMTAttentionMultiCell(att_cell, decoder_cells)\n", + " \n", + " self.initial_state = tuple(\n", + " zs.clone(cell_state=es)\n", + " if isinstance(zs, tf.contrib.seq2seq.AttentionWrapperState) else es\n", + " for zs, es in zip(\n", + " gcell.zero_state(batch_size, dtype=tf.float32), encoder_state))\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " decoder_embedded,\n", + " self.Y_seq_len,\n", + " time_major = False\n", + " )\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = gcell,\n", + " helper = training_helper,\n", + " initial_state = self.initial_state,\n", + " output_layer = to_dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " with tf.variable_scope('decode', reuse=True):\n", + " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_outputs, beam_width)\n", + " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", + " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", + " \n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", + " num_units = size_layer, \n", + " memory = encoder_out_tiled,\n", + " memory_sequence_length = X_seq_len_tiled)\n", + " att_cell = tf.contrib.seq2seq.AttentionWrapper(\n", + " cell = attention_cell,\n", + " attention_mechanism = attention_mechanism,\n", + " attention_layer_size = None,\n", + " alignment_history = False,\n", + " output_attention = False)\n", + " gcell = GNMTAttentionMultiCell(att_cell, decoder_cells)\n", + " \n", + " self.initial_state = tuple(\n", + " zs.clone(cell_state=es)\n", + " if isinstance(zs, tf.contrib.seq2seq.AttentionWrapperState) else es\n", + " for zs, es in zip(\n", + " gcell.zero_state(batch_size * beam_width, dtype=tf.float32), encoder_state_tiled))\n", + " \n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = gcell,\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = self.initial_state,\n", + " beam_width = beam_width,\n", + " output_layer = to_dense,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.predicted_ids[:, :, 0]\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 3\n", + "learning_rate = 1e-3\n", + "batch_size = 64\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, size_layer, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[25165, 25165, 25165, 12719, 12719, 12719, 12719, 11384, 4722,\n", + " 4722, 4722, 20577, 20577, 20577, 20577, 20577, 31007, 31007,\n", + " 31007, 31007, 31007, 31007, 31007, 31007, 31007, 31007, 95,\n", + " 95, 95, 95, 10788, 10788, 26129, 25836, 25836, 25836,\n", + " 25836, 25836, 25836, 28841, 22661, 6972, 6972, 6972, 6972,\n", + " 6972, 6972, 18721, 17965, 17965, 17965, 17965, 17965, 20057,\n", + " 20057, 6549, 6549, 4051, 4051, 4051, 27265, 25803, 25803,\n", + " 22130, 3280, 3280, 3280, 21261, 21261, 21261, 12184, 12184],\n", + " [23674, 22063, 20393, 20393, 25949, 25949, 25949, 25949, 25949,\n", + " 24464, 24464, 24464, 24464, 24464, 24464, 9336, 9336, 13471,\n", + " 13471, 13471, 21559, 15741, 30712, 30712, 30712, 3493, 3493,\n", + " 3493, 3493, 3493, 7100, 7100, 7100, 7100, 24171, 7100,\n", + " 24171, 24171, 31383, 31383, 31383, 31383, 31383, 31383, 24730,\n", + " 1435, 1435, 1435, 1435, 1435, 24100, 24100, 17077, 17077,\n", + " 27546, 27546, 27546, 27546, 27546, 1018, 14427, 14427, 14427,\n", + " 14427, 1288, 1288, 1288, 12117, 12117, 11043, 12117, 11043],\n", + " [ 2799, 2799, 15513, 15513, 17585, 17585, 17585, 17585, 31530,\n", + " 31530, 31530, 31530, 31530, 31530, 29816, 29816, 29816, 29816,\n", + " 30605, 30605, 11155, 11155, 25968, 20258, 20258, 20258, 20258,\n", + " 20258, 24781, 16037, 16037, 16037, 28694, 28694, 28694, 28694,\n", + " 28694, 13465, 13465, 13465, 13465, 13465, 4367, 4367, 4367,\n", + " 4367, 4367, 24648, 24648, 24648, 24648, 20209, 20209, 16959,\n", + " 7939, 7939, 7939, 7939, 7939, 7939, 9431, 884, 884,\n", + " 884, 23502, 23502, 23502, 16055, 16055, 27778, 27778, 27778],\n", + " [23690, 23690, 23466, 23466, 20636, 20636, 6595, 6595, 6595,\n", + " 11354, 11354, 25826, 25826, 31189, 28645, 28645, 28645, 28645,\n", + " 28645, 28645, 28645, 16582, 16582, 16582, 16582, 16582, 26439,\n", + " 26439, 26439, 26439, 26439, 26439, 8884, 7518, 7518, 7518,\n", + " 7518, 5426, 5426, 5426, 5426, 8135, 8135, 20, 20,\n", + " 20, 20, 20, 14917, 14917, 14917, 14917, 12198, 12198,\n", + " 12198, 12198, 12198, 12198, 12198, 10572, 10572, 10572, 19585,\n", + " 19585, 19585, 19585, 19585, 27849, 19670, 27849, 19670, 19670],\n", + " [12009, 1243, 1243, 402, 402, 402, 250, 765, 18652,\n", + " 20325, 20325, 11693, 11693, 11693, 11693, 10145, 14725, 14725,\n", + " 14725, 14725, 14725, 17905, 17905, 17905, 17905, 3595, 3595,\n", + " 3595, 30250, 30250, 30250, 30250, 25109, 25109, 25109, 25109,\n", + " 25109, 1727, 9160, 9160, 9160, 9160, 9160, 29573, 29573,\n", + " 29573, 18413, 18413, 18413, 26676, 26676, 29048, 29048, 8339,\n", + " 10019, 2370, 2370, 2370, 2370, 31397, 31397, 10884, 29898,\n", + " 29898, 29898, 29898, 6381, 6381, 6381, 6381, 6381, 21203],\n", + " [ 6069, 6069, 10613, 2782, 2782, 4022, 5799, 3820, 3820,\n", + " 3820, 735, 735, 735, 735, 5507, 5507, 22278, 22278,\n", + " 22278, 22278, 22278, 22278, 3130, 3130, 3130, 3130, 3130,\n", + " 31193, 4235, 4235, 4235, 4235, 14393, 8583, 8583, 6940,\n", + " 6940, 6940, 6940, 5542, 5542, 5542, 24201, 24201, 24201,\n", + " 24201, 11199, 11199, 5072, 5072, 11904, 11904, 11904, 11904,\n", + " 11904, 381, 381, 381, 26397, 26397, 26397, 30051, 30051,\n", + " 30051, 20504, 20504, 20504, 31872, 8712, 21364, 21364, 19558],\n", + " [24761, 31101, 31101, 31101, 31101, 13779, 13779, 7655, 7655,\n", + " 7655, 4976, 4976, 4976, 4976, 4976, 4976, 15264, 4881,\n", + " 4881, 4881, 4881, 4881, 4881, 4881, 11239, 16713, 16713,\n", + " 16713, 19941, 19941, 19941, 23736, 24570, 24570, 24570, 21996,\n", + " 15264, 21996, 25539, 21996, 25539, 25539, 10168, 10168, 10168,\n", + " 21111, 24998, 24998, 27812, 5474, 5474, 29701, 29701, 29701,\n", + " 26261, 26261, 26261, 14887, 14887, 14887, 504, 504, 504,\n", + " 21988, 21988, 25783, 28587, 28587, 28587, 25216, 25216, 25216],\n", + " [ 9621, 3849, 3849, 3849, 20056, 20056, 22210, 22210, 22210,\n", + " 22210, 22210, 4889, 4889, 4889, 4889, 4889, 10281, 22904,\n", + " 22904, 22904, 22904, 22904, 22904, 16934, 10234, 14178, 14178,\n", + " 14178, 14178, 8964, 9063, 9063, 9063, 9063, 9063, 9063,\n", + " 9063, 13315, 13315, 13315, 13315, 13315, 13315, 13315, 13315,\n", + " 13315, 13315, 20625, 20625, 20625, 20625, 11666, 1186, 1186,\n", + " 1186, 28330, 9362, 9362, 9362, 10730, 10730, 10730, 10730,\n", + " 10730, 1636, 1636, 10373, 10373, 10373, 10373, 20568, 20935],\n", + " [31366, 20724, 20724, 9777, 15993, 15993, 15993, 15993, 31304,\n", + " 31304, 31304, 3360, 3219, 3219, 3219, 3219, 3219, 3219,\n", + " 3219, 27794, 27794, 27794, 27794, 27794, 31127, 15901, 15901,\n", + " 15901, 15901, 15901, 31788, 31788, 31788, 31788, 15953, 15953,\n", + " 15953, 6993, 6993, 428, 428, 428, 21954, 21954, 21954,\n", + " 21954, 21954, 21954, 1261, 5045, 5045, 16239, 16376, 16376,\n", + " 16376, 16376, 16376, 16376, 21742, 23058, 23058, 12370, 12370,\n", + " 12370, 12370, 8271, 8271, 8271, 11137, 30457, 27482, 27482],\n", + " [ 647, 11816, 11816, 9976, 9976, 22952, 22952, 22952, 22952,\n", + " 22952, 16823, 31370, 31370, 31370, 31370, 31370, 14620, 3823,\n", + " 3823, 28695, 28695, 28695, 28695, 25197, 5007, 5007, 16911,\n", + " 16911, 16911, 29522, 29522, 29522, 29522, 29522, 1503, 3071,\n", + " 1503, 3071, 1503, 24004, 24004, 24004, 3800, 3800, 3800,\n", + " 24549, 21318, 24549, 24549, 24549, 30488, 30488, 30488, 30488,\n", + " 28521, 1372, 1372, 1372, 1372, 699, 699, 30955, 30955,\n", + " 23399, 23399, 23399, 13372, 16826, 31950, 31950, 31950, 7102]],\n", + " dtype=int32), 10.373663, 0.0]" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 3125/3125 [20:10<00:00, 2.58it/s, accuracy=0.256, cost=4.63]\n", + "minibatch loop: 100%|██████████| 79/79 [00:14<00:00, 5.51it/s, accuracy=0.317, cost=4.11]\n", + "minibatch loop: 0%| | 0/3125 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.055380445" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/46.transformer-xl.ipynb b/neural-machine-translation/46.transformer-xl.ipynb deleted file mode 100644 index 68f8a2a..0000000 --- a/neural-machine-translation/46.transformer-xl.ipynb +++ /dev/null @@ -1,1104 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "learning_rate = 1e-4\n", - "batch_size = 16\n", - "epoch = 20\n", - "n_layer = 3\n", - "d_model = 256\n", - "d_embed = 256\n", - "n_head = 10\n", - "d_head = 50\n", - "d_inner = 512" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "def positional_embedding(pos_seq, inv_freq, bsz = None):\n", - " sinusoid_inp = tf.einsum('i,j->ij', pos_seq, inv_freq)\n", - " pos_emb = tf.concat([tf.sin(sinusoid_inp), tf.cos(sinusoid_inp)], -1)\n", - " if bsz is not None:\n", - " return tf.tile(pos_emb[:, None, :], [1, bsz, 1])\n", - " else:\n", - " return pos_emb[:, None, :]\n", - "\n", - "\n", - "def positionwise_FF(inp, d_model, d_inner, kernel_initializer, scope = 'ff'):\n", - " output = inp\n", - " with tf.variable_scope(scope):\n", - " output = tf.layers.dense(\n", - " inp,\n", - " d_inner,\n", - " activation = tf.nn.relu,\n", - " kernel_initializer = kernel_initializer,\n", - " name = 'layer_1',\n", - " )\n", - " output = tf.layers.dense(\n", - " output,\n", - " d_model,\n", - " kernel_initializer = kernel_initializer,\n", - " name = 'layer_2',\n", - " )\n", - " output = tf.contrib.layers.layer_norm(\n", - " output + inp, begin_norm_axis = -1\n", - " )\n", - " return output\n", - "\n", - "\n", - "def rel_shift(x):\n", - " x_size = tf.shape(x)\n", - "\n", - " x = tf.pad(x, [[0, 0], [1, 0], [0, 0], [0, 0]])\n", - " x = tf.reshape(x, [x_size[1] + 1, x_size[0], x_size[2], x_size[3]])\n", - " x = tf.slice(x, [1, 0, 0, 0], [-1, -1, -1, -1])\n", - " x = tf.reshape(x, x_size)\n", - "\n", - " return x\n", - "\n", - "\n", - "def rel_multihead_attn(\n", - " w,\n", - " r,\n", - " r_w_bias,\n", - " r_r_bias,\n", - " attn_mask,\n", - " mems,\n", - " d_model,\n", - " n_head,\n", - " d_head,\n", - " kernel_initializer,\n", - " scope = 'rel_attn',\n", - "):\n", - " scale = 1 / (d_head ** 0.5)\n", - " with tf.variable_scope(scope):\n", - " qlen = tf.shape(w)[0]\n", - " rlen = tf.shape(r)[0]\n", - " bsz = tf.shape(w)[1]\n", - "\n", - " cat = (\n", - " tf.concat([mems, w], 0)\n", - " if mems is not None and mems.shape.ndims > 1\n", - " else w\n", - " )\n", - " w_heads = tf.layers.dense(\n", - " cat,\n", - " 3 * n_head * d_head,\n", - " use_bias = False,\n", - " kernel_initializer = kernel_initializer,\n", - " name = 'qkv',\n", - " )\n", - " r_head_k = tf.layers.dense(\n", - " r,\n", - " n_head * d_head,\n", - " use_bias = False,\n", - " kernel_initializer = kernel_initializer,\n", - " name = 'r',\n", - " )\n", - "\n", - " w_head_q, w_head_k, w_head_v = tf.split(w_heads, 3, -1)\n", - " w_head_q = w_head_q[-qlen:]\n", - "\n", - " klen = tf.shape(w_head_k)[0]\n", - "\n", - " w_head_q = tf.reshape(w_head_q, [qlen, bsz, n_head, d_head])\n", - " w_head_k = tf.reshape(w_head_k, [klen, bsz, n_head, d_head])\n", - " w_head_v = tf.reshape(w_head_v, [klen, bsz, n_head, d_head])\n", - "\n", - " r_head_k = tf.reshape(r_head_k, [rlen, n_head, d_head])\n", - "\n", - " rw_head_q = w_head_q + r_w_bias\n", - " rr_head_q = w_head_q + r_r_bias\n", - "\n", - " AC = tf.einsum('ibnd,jbnd->ijbn', rw_head_q, w_head_k)\n", - " BD = tf.einsum('ibnd,jnd->ijbn', rr_head_q, r_head_k)\n", - " BD = rel_shift(BD)\n", - " \n", - " paddings = tf.fill(tf.shape(BD), float('-inf'))\n", - "\n", - " attn_score = (AC + BD) * scale\n", - " attn_mask_t = attn_mask[:, :, None, None]\n", - " attn_score = attn_score * (1 - attn_mask_t) - 1e30 * attn_mask_t\n", - "\n", - " attn_prob = tf.nn.softmax(attn_score, 1)\n", - " attn_vec = tf.einsum('ijbn,jbnd->ibnd', attn_prob, w_head_v)\n", - " size_t = tf.shape(attn_vec)\n", - " attn_vec = tf.reshape(attn_vec, [size_t[0], size_t[1], n_head * d_head])\n", - "\n", - " attn_out = tf.layers.dense(\n", - " attn_vec,\n", - " d_model,\n", - " use_bias = False,\n", - " kernel_initializer = kernel_initializer,\n", - " name = 'o',\n", - " )\n", - "\n", - " output = tf.contrib.layers.layer_norm(\n", - " attn_out + w, begin_norm_axis = -1\n", - " )\n", - " return output\n", - "\n", - "\n", - "def embedding_lookup(lookup_table, x):\n", - " return tf.nn.embedding_lookup(lookup_table, x)\n", - "\n", - "\n", - "def mask_adaptive_embedding_lookup(\n", - " x,\n", - " n_token,\n", - " d_embed,\n", - " d_proj,\n", - " cutoffs,\n", - " initializer,\n", - " proj_initializer,\n", - " div_val = 1,\n", - " proj_same_dim = True,\n", - " scope = 'adaptive_embed',\n", - " **kwargs\n", - "):\n", - " emb_scale = d_proj ** 0.5\n", - " with tf.variable_scope(scope):\n", - " if div_val == 1:\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table', [n_token, d_embed], initializer = initializer\n", - " )\n", - " y = embedding_lookup(lookup_table, x)\n", - " if d_proj != d_embed:\n", - " proj_W = tf.get_variable(\n", - " 'proj_W', [d_embed, d_proj], initializer = proj_initializer\n", - " )\n", - " y = tf.einsum('ibe,ed->ibd', y, proj_W)\n", - " else:\n", - " proj_W = None\n", - " ret_params = [lookup_table, proj_W]\n", - " else:\n", - " tables, projs = [], []\n", - " cutoff_ends = [0] + cutoffs + [n_token]\n", - " x_size = tf.shape(x)\n", - " y = tf.zeros([x_size[0], x_size[1], d_proj])\n", - " for i in range(len(cutoff_ends) - 1):\n", - " with tf.variable_scope('cutoff_{}'.format(i)):\n", - " l_idx, r_idx = cutoff_ends[i], cutoff_ends[i + 1]\n", - " mask = (x >= l_idx) & (x < r_idx)\n", - " cur_x = tf.boolean_mask(x, mask) - l_idx\n", - " cur_d_embed = d_embed // (div_val ** i)\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table',\n", - " [r_idx - l_idx, cur_d_embed],\n", - " initializer = initializer,\n", - " )\n", - " cur_y = embedding_lookup(lookup_table, cur_x)\n", - " if d_proj == cur_d_embed and not proj_same_dim:\n", - " proj_W = None\n", - " else:\n", - " proj_W = tf.get_variable(\n", - " 'proj_W',\n", - " [cur_d_embed, d_proj],\n", - " initializer = proj_initializer,\n", - " )\n", - " cur_y = tf.einsum('id,de->ie', cur_y, proj_W)\n", - " mask_idx = tf.to_int64(tf.where(mask))\n", - " y += tf.scatter_nd(\n", - " mask_idx, cur_y, tf.to_int64(tf.shape(y))\n", - " )\n", - " tables.append(lookup_table)\n", - " projs.append(proj_W)\n", - " ret_params = [tables, projs]\n", - "\n", - " y *= emb_scale\n", - " return y, ret_params\n", - "\n", - "\n", - "def mul_adaptive_embedding_lookup(\n", - " x,\n", - " n_token,\n", - " d_embed,\n", - " d_proj,\n", - " cutoffs,\n", - " initializer,\n", - " proj_initializer,\n", - " div_val = 1,\n", - " perms = None,\n", - " proj_same_dim = True,\n", - " scope = 'adaptive_embed',\n", - "):\n", - " \"\"\"\n", - " perms: If None, first compute W = W1 x W2 (projection for each bin),\n", - " and then compute X x W (embedding lookup). If not None,\n", - " use bin-based embedding lookup with max_bin_size defined by\n", - " the shape of perms.\n", - " \"\"\"\n", - " emb_scale = d_proj ** 0.5\n", - " with tf.variable_scope(scope):\n", - " if div_val == 1:\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table', [n_token, d_embed], initializer = initializer\n", - " )\n", - " y = embedding_lookup(lookup_table, x)\n", - " if d_proj != d_embed:\n", - " proj_W = tf.get_variable(\n", - " 'proj_W', [d_embed, d_proj], initializer = proj_initializer\n", - " )\n", - " y = tf.einsum('ibe,ed->ibd', y, proj_W)\n", - " else:\n", - " proj_W = None\n", - " ret_params = [lookup_table, proj_W]\n", - " else:\n", - " tables, projs = [], []\n", - " cutoff_ends = [0] + cutoffs + [n_token]\n", - " x_size = tf.shape(x)\n", - " if perms is None:\n", - " cat_lookup = []\n", - " else:\n", - " cat_lookup = tf.zeros([x_size[0], x_size[1], d_proj])\n", - " for i in range(len(cutoff_ends) - 1):\n", - " with tf.variable_scope('cutoff_{}'.format(i)):\n", - " l_idx, r_idx = cutoff_ends[i], cutoff_ends[i + 1]\n", - " cur_d_embed = d_embed // (div_val ** i)\n", - " lookup_table = tf.get_variable(\n", - " 'lookup_table',\n", - " [r_idx - l_idx, cur_d_embed],\n", - " initializer = initializer,\n", - " )\n", - " if cur_d_embed == d_proj and not proj_same_dim:\n", - " proj_W = None\n", - " else:\n", - " proj_W = tf.get_variable(\n", - " 'proj_W',\n", - " [cur_d_embed, d_proj],\n", - " initializer = proj_initializer,\n", - " )\n", - " if perms is None:\n", - " cat_lookup.append(\n", - " tf.einsum('ie,ed->id', lookup_table, proj_W)\n", - " )\n", - " else:\n", - " # speed up the computation of the first bin\n", - " # also save some meory\n", - " if i == 0:\n", - " cur_y = embedding_lookup(\n", - " lookup_table, tf.minimum(x, r_idx - 1)\n", - " )\n", - " if proj_W is not None:\n", - " cur_y = tf.einsum('ibe,ed->ibd', cur_y, proj_W)\n", - " cur_y *= perms[i][:, :, None]\n", - " cat_lookup += cur_y\n", - " else:\n", - " cur_x = tf.einsum(\n", - " 'ib,ibk->k', tf.to_float(x - l_idx), perms[i]\n", - " )\n", - " cur_x = tf.to_int32(cur_x)\n", - " cur_y = embedding_lookup(lookup_table, cur_x)\n", - " if proj_W is not None:\n", - " cur_y = tf.einsum('ke,ed->kd', cur_y, proj_W)\n", - " cat_lookup += tf.einsum(\n", - " 'kd,ibk->ibd', cur_y, perms[i]\n", - " )\n", - " tables.append(lookup_table)\n", - " projs.append(proj_W)\n", - " if perms is None:\n", - " cat_lookup = tf.concat(cat_lookup, 0)\n", - " y = embedding_lookup(cat_lookup, x)\n", - " else:\n", - " y = cat_lookup\n", - " ret_params = [tables, projs]\n", - "\n", - " y *= emb_scale\n", - " return y, ret_params\n", - "\n", - "\n", - "def mask_adaptive_logsoftmax(\n", - " hidden,\n", - " target,\n", - " n_token,\n", - " d_embed,\n", - " d_proj,\n", - " cutoffs,\n", - " params,\n", - " tie_projs,\n", - " initializer = None,\n", - " proj_initializer = None,\n", - " div_val = 1,\n", - " scope = 'adaptive_softmax',\n", - " proj_same_dim = True,\n", - " return_mean = True,\n", - " **kwargs\n", - "):\n", - " def _logit(x, W, b, proj):\n", - " y = x\n", - " if proj is not None:\n", - " y = tf.einsum('ibd,ed->ibe', y, proj)\n", - " return tf.einsum('ibd,nd->ibn', y, W) + b\n", - "\n", - " params_W, params_projs = params[0], params[1]\n", - "\n", - " def _gather_logprob(logprob, target):\n", - " lp_size = tf.shape(logprob)\n", - " r = tf.range(lp_size[0])\n", - " idx = tf.stack([r, target], 1)\n", - " return tf.gather_nd(logprob, idx)\n", - "\n", - " with tf.variable_scope(scope):\n", - " if len(cutoffs) == 0:\n", - " softmax_b = tf.get_variable(\n", - " 'bias', [n_token], initializer = tf.zeros_initializer()\n", - " )\n", - " output = _logit(hidden, params_W, softmax_b, params_projs)\n", - " nll = tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " labels = target, logits = output\n", - " )\n", - " else:\n", - " cutoff_ends = [0] + cutoffs + [n_token]\n", - " nll = tf.zeros_like(target, dtype = tf.float32)\n", - " for i in range(len(cutoff_ends) - 1):\n", - " with tf.variable_scope('cutoff_{}'.format(i)):\n", - " l_idx, r_idx = cutoff_ends[i], cutoff_ends[i + 1]\n", - " mask = (target >= l_idx) & (target < r_idx)\n", - " mask_idx = tf.where(mask)\n", - " cur_target = tf.boolean_mask(target, mask) - l_idx\n", - " cur_d_embed = d_embed // (div_val ** i)\n", - "\n", - " if div_val == 1:\n", - " cur_W = params_W[l_idx:r_idx]\n", - " else:\n", - " cur_W = params_W[i]\n", - " cur_b = tf.get_variable(\n", - " 'b',\n", - " [r_idx - l_idx],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " if tie_projs[i]:\n", - " if div_val == 1:\n", - " cur_proj = params_projs\n", - " else:\n", - " cur_proj = params_projs[i]\n", - " else:\n", - " if (\n", - " div_val == 1 or not proj_same_dim\n", - " ) and d_proj == cur_d_embed:\n", - " cur_proj = None\n", - " else:\n", - " cur_proj = tf.get_variable(\n", - " 'proj',\n", - " [cur_d_embed, d_proj],\n", - " initializer = proj_initializer,\n", - " )\n", - " if i == 0:\n", - " cluster_W = tf.get_variable(\n", - " 'cluster_W',\n", - " [len(cutoffs), d_embed],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " cluster_b = tf.get_variable(\n", - " 'cluster_b',\n", - " [len(cutoffs)],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " cur_W = tf.concat([cur_W, cluster_W], 0)\n", - " cur_b = tf.concat([cur_b, cluster_b], 0)\n", - "\n", - " head_logit = _logit(hidden, cur_W, cur_b, cur_proj)\n", - " head_logprob = tf.nn.log_softmax(head_logit)\n", - " cur_head_logprob = tf.boolean_mask(head_logprob, mask)\n", - " cur_logprob = _gather_logprob(\n", - " cur_head_logprob, cur_target\n", - " )\n", - " else:\n", - " cur_head_logprob = tf.boolean_mask(head_logprob, mask)\n", - " cur_hidden = tf.boolean_mask(hidden, mask)\n", - " tail_logit = tf.squeeze(\n", - " _logit(cur_hidden[None], cur_W, cur_b, cur_proj), 0\n", - " )\n", - " tail_logprob = tf.nn.log_softmax(tail_logit)\n", - " cur_logprob = cur_head_logprob[\n", - " :, cutoff_ends[1] + i - 1\n", - " ] + _gather_logprob(tail_logprob, cur_target)\n", - " nll += tf.scatter_nd(\n", - " mask_idx, -cur_logprob, tf.to_int64(tf.shape(nll))\n", - " )\n", - " if return_mean:\n", - " nll = tf.reduce_mean(nll)\n", - " return nll\n", - "\n", - "\n", - "def mul_adaptive_logsoftmax(\n", - " hidden,\n", - " target,\n", - " n_token,\n", - " d_embed,\n", - " d_proj,\n", - " cutoffs,\n", - " params,\n", - " tie_projs,\n", - " initializer = None,\n", - " proj_initializer = None,\n", - " div_val = 1,\n", - " perms = None,\n", - " proj_same_dim = True,\n", - " scope = 'adaptive_softmax',\n", - " **kwargs\n", - "):\n", - " def _logit(x, W, b, proj):\n", - " y = x\n", - " if x.shape.ndims == 3:\n", - " if proj is not None:\n", - " y = tf.einsum('ibd,ed->ibe', y, proj)\n", - " return tf.einsum('ibd,nd->ibn', y, W) + b\n", - " else:\n", - " if proj is not None:\n", - " y = tf.einsum('id,ed->ie', y, proj)\n", - " return tf.einsum('id,nd->in', y, W) + b\n", - "\n", - " params_W, params_projs = params[0], params[1]\n", - "\n", - " with tf.variable_scope(scope):\n", - " if len(cutoffs) == 0:\n", - " softmax_b = tf.get_variable(\n", - " 'bias', [n_token], initializer = tf.zeros_initializer()\n", - " )\n", - " output = _logit(hidden, params_W, softmax_b, params_projs)\n", - " nll = tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " labels = target, logits = output\n", - " )\n", - " nll = tf.reduce_mean(nll)\n", - " else:\n", - " total_loss, total_cnt = 0, 0\n", - " cutoff_ends = [0] + cutoffs + [n_token]\n", - " for i in range(len(cutoff_ends) - 1):\n", - " with tf.variable_scope('cutoff_{}'.format(i)):\n", - " l_idx, r_idx = cutoff_ends[i], cutoff_ends[i + 1]\n", - "\n", - " cur_d_embed = d_embed // (div_val ** i)\n", - "\n", - " if div_val == 1:\n", - " cur_W = params_W[l_idx:r_idx]\n", - " else:\n", - " cur_W = params_W[i]\n", - " cur_b = tf.get_variable(\n", - " 'b',\n", - " [r_idx - l_idx],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " if tie_projs[i]:\n", - " if div_val == 1:\n", - " cur_proj = params_projs\n", - " else:\n", - " cur_proj = params_projs[i]\n", - " else:\n", - " if (\n", - " div_val == 1 or not proj_same_dim\n", - " ) and d_proj == cur_d_embed:\n", - " cur_proj = None\n", - " else:\n", - " cur_proj = tf.get_variable(\n", - " 'proj',\n", - " [cur_d_embed, d_proj],\n", - " initializer = proj_initializer,\n", - " )\n", - "\n", - " if i == 0:\n", - " cluster_W = tf.get_variable(\n", - " 'cluster_W',\n", - " [len(cutoffs), d_embed],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " cluster_b = tf.get_variable(\n", - " 'cluster_b',\n", - " [len(cutoffs)],\n", - " initializer = tf.zeros_initializer(),\n", - " )\n", - " cur_W = tf.concat([cur_W, cluster_W], 0)\n", - " cur_b = tf.concat([cur_b, cluster_b], 0)\n", - "\n", - " head_logit = _logit(hidden, cur_W, cur_b, cur_proj)\n", - "\n", - " head_target = kwargs.get('head_target')\n", - " head_nll = tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " labels = head_target, logits = head_logit\n", - " )\n", - "\n", - " masked_loss = head_nll * perms[i]\n", - " total_loss += tf.reduce_sum(masked_loss)\n", - " total_cnt += tf.reduce_sum(perms[i])\n", - "\n", - " # head_logprob = tf.nn.log_softmax(head_logit)\n", - "\n", - " # final_logprob = head_logprob * perms[i][:, :, None]\n", - " # final_target = tf.one_hot(target, tf.shape(head_logprob)[2])\n", - " # total_loss -= tf.einsum('ibn,ibn->', final_logprob, final_target)\n", - " # total_cnt += tf.reduce_sum(perms[i])\n", - " else:\n", - " cur_head_nll = tf.einsum(\n", - " 'ib,ibk->k', head_nll, perms[i]\n", - " )\n", - "\n", - " cur_hidden = tf.einsum('ibd,ibk->kd', hidden, perms[i])\n", - " tail_logit = _logit(cur_hidden, cur_W, cur_b, cur_proj)\n", - "\n", - " tail_target = tf.einsum(\n", - " 'ib,ibk->k', tf.to_float(target - l_idx), perms[i]\n", - " )\n", - " tail_nll = tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " labels = tf.to_int32(tail_target),\n", - " logits = tail_logit,\n", - " )\n", - "\n", - " sum_nll = cur_head_nll + tail_nll\n", - " mask = tf.reduce_sum(perms[i], [0, 1])\n", - "\n", - " masked_loss = sum_nll * mask\n", - " total_loss += tf.reduce_sum(masked_loss)\n", - " total_cnt += tf.reduce_sum(mask)\n", - "\n", - " nll = total_loss / total_cnt\n", - "\n", - " return nll\n", - "\n", - "\n", - "def _create_mask(qlen, mlen, same_length = False):\n", - " attn_mask = tf.ones([qlen, qlen])\n", - " mask_u = tf.matrix_band_part(attn_mask, 0, -1)\n", - " mask_dia = tf.matrix_band_part(attn_mask, 0, 0)\n", - " attn_mask_pad = tf.zeros([qlen, mlen])\n", - " ret = tf.concat([attn_mask_pad, mask_u - mask_dia], 1)\n", - " if same_length:\n", - " mask_l = tf.matrix_band_part(attn_mask, -1, 0)\n", - " ret = tf.concat([ret[:, :qlen] + mask_l - mask_dia, ret[:, qlen:]], 1)\n", - " return ret\n", - "\n", - "\n", - "def _cache_mem(curr_out, prev_mem, mem_len = None):\n", - " if mem_len is None or prev_mem is None:\n", - " new_mem = curr_out\n", - " elif mem_len == 0:\n", - " return prev_mem\n", - " else:\n", - " new_mem = tf.concat([prev_mem, curr_out], 0)[-mem_len:]\n", - "\n", - " return tf.stop_gradient(new_mem)\n", - "\n", - "\n", - "def transformer(\n", - " dec_inp,\n", - " mems,\n", - " n_token,\n", - " n_layer,\n", - " d_model,\n", - " d_embed,\n", - " n_head,\n", - " d_head,\n", - " d_inner,\n", - " initializer,\n", - " proj_initializer = None,\n", - " mem_len = None,\n", - " cutoffs = [],\n", - " div_val = 1,\n", - " tie_projs = [],\n", - " same_length = False,\n", - " clamp_len = -1,\n", - " untie_r = False,\n", - " proj_same_dim = True,\n", - " scope = 'transformer',\n", - " reuse = tf.AUTO_REUSE\n", - "):\n", - " \"\"\"\n", - " cutoffs: a list of python int. Cutoffs for adaptive softmax.\n", - " tie_projs: a list of python bools. Whether to tie the projections.\n", - " perms: a list of tensors. Each tensor should of size [len, bsz, bin_size].\n", - " Only used in the adaptive setting.\n", - " \"\"\"\n", - " new_mems = []\n", - " with tf.variable_scope(scope,reuse=reuse):\n", - " if untie_r:\n", - " r_w_bias = tf.get_variable(\n", - " 'r_w_bias', [n_layer, n_head, d_head], initializer = initializer\n", - " )\n", - " r_r_bias = tf.get_variable(\n", - " 'r_r_bias', [n_layer, n_head, d_head], initializer = initializer\n", - " )\n", - " else:\n", - " r_w_bias = tf.get_variable(\n", - " 'r_w_bias', [n_head, d_head], initializer = initializer\n", - " )\n", - " r_r_bias = tf.get_variable(\n", - " 'r_r_bias', [n_head, d_head], initializer = initializer\n", - " )\n", - "\n", - " qlen = tf.shape(dec_inp)[0]\n", - " mlen = tf.shape(mems[0])[0] if mems is not None else 0\n", - " klen = mlen + qlen\n", - "\n", - " if proj_initializer is None:\n", - " proj_initializer = initializer\n", - " lookup_fn = mask_adaptive_embedding_lookup\n", - " embeddings, shared_params = lookup_fn(\n", - " x = dec_inp,\n", - " n_token = n_token,\n", - " d_embed = d_embed,\n", - " d_proj = d_model,\n", - " cutoffs = cutoffs,\n", - " initializer = initializer,\n", - " proj_initializer = proj_initializer,\n", - " div_val = div_val,\n", - " proj_same_dim = proj_same_dim,\n", - " )\n", - "\n", - " attn_mask = _create_mask(qlen, mlen, same_length)\n", - "\n", - " pos_seq = tf.range(klen - 1, -1, -1.0)\n", - " if clamp_len > 0:\n", - " pos_seq = tf.minimum(pos_seq, clamp_len)\n", - " inv_freq = 1 / (10000 ** (tf.range(0, d_model, 2.0) / d_model))\n", - " pos_emb = positional_embedding(pos_seq, inv_freq)\n", - "\n", - " if mems is None:\n", - " mems = [None] * n_layer\n", - " output = embeddings\n", - " for i in range(n_layer):\n", - " # cache new mems\n", - " new_mems.append(_cache_mem(output, mems[i], mem_len))\n", - "\n", - " with tf.variable_scope('layer_{}'.format(i)):\n", - " output = rel_multihead_attn(\n", - " w = output,\n", - " r = pos_emb,\n", - " r_w_bias = r_w_bias if not untie_r else r_w_bias[i],\n", - " r_r_bias = r_r_bias if not untie_r else r_r_bias[i],\n", - " attn_mask = attn_mask,\n", - " mems = mems[i],\n", - " d_model = d_model,\n", - " n_head = n_head,\n", - " d_head = d_head,\n", - " kernel_initializer = initializer,\n", - " )\n", - " output = positionwise_FF(\n", - " inp = output,\n", - " d_model = d_model,\n", - " d_inner = d_inner,\n", - " kernel_initializer = initializer,\n", - " )\n", - "\n", - " return output, new_mems" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self):\n", - "\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - "\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " initializer = tf.initializers.random_normal(stddev = 0.1)\n", - "\n", - " def forward(x, y, reuse = tf.AUTO_REUSE):\n", - " memory = tf.fill(\n", - " [n_layer, tf.shape(x)[0], tf.shape(x)[1], d_model], PAD\n", - " )\n", - " memory = tf.cast(memory, tf.float32)\n", - " logits, next_memory = transformer(\n", - " x,\n", - " memory,\n", - " len(dictionary_from),\n", - " n_layer,\n", - " d_model,\n", - " d_embed,\n", - " n_head,\n", - " d_head,\n", - " d_inner,\n", - " initializer,\n", - " scope='encoder',\n", - " reuse=reuse\n", - " )\n", - " logits, next_memory = transformer(\n", - " x,\n", - " next_memory,\n", - " len(dictionary_to),\n", - " n_layer,\n", - " d_model,\n", - " d_embed,\n", - " n_head,\n", - " d_head,\n", - " d_inner,\n", - " initializer,\n", - " scope='decoder',\n", - " reuse=reuse\n", - " )\n", - " logits = transformer(\n", - " y,\n", - " next_memory,\n", - " len(dictionary_to),\n", - " n_layer,\n", - " d_model,\n", - " d_embed,\n", - " n_head,\n", - " d_head,\n", - " d_inner,\n", - " initializer,\n", - " scope='decoder_1',\n", - " reuse=reuse\n", - " )[0]\n", - " return tf.layers.dense(logits, len(dictionary_from), reuse=tf.AUTO_REUSE)\n", - " self.training_logits = forward(self.X, decoder_input)\n", - " \n", - " def cond(i, y, temp):\n", - " return i < tf.reduce_max(tf.shape(self.X)[1])\n", - " \n", - " def body(i, y, temp):\n", - " logits = forward(self.X, y,reuse=True)\n", - " ids = tf.argmax(logits, -1)[:, i]\n", - " ids = tf.expand_dims(ids, -1)\n", - " temp = tf.concat([temp[:, 1:], ids], -1)\n", - " y = tf.concat([temp[:, -(i+1):], temp[:, :-(i+1)]], -1)\n", - " y = tf.reshape(y, [tf.shape(temp)[0], tf.shape(self.X)[1]])\n", - " i += 1\n", - " return i, y, temp\n", - " \n", - " target = tf.fill([batch_size, tf.shape(self.X)[1]], GO)\n", - " target = tf.cast(target, tf.int64)\n", - " self.target = target\n", - " \n", - " _, self.predicting_ids, _ = tf.while_loop(cond, body, \n", - " [tf.constant(0), target, target])\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, maxlen_answer, dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot()\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 7.368046, avg accuracy: 0.030184\n", - "epoch: 2, avg loss: 6.681681, avg accuracy: 0.044937\n", - "epoch: 3, avg loss: 6.353854, avg accuracy: 0.046410\n", - "epoch: 4, avg loss: 6.194322, avg accuracy: 0.054326\n", - "epoch: 5, avg loss: 6.122391, avg accuracy: 0.067359\n", - "epoch: 6, avg loss: 6.000588, avg accuracy: 0.082808\n", - "epoch: 7, avg loss: 5.978110, avg accuracy: 0.087615\n", - "epoch: 8, avg loss: 5.963039, avg accuracy: 0.086300\n", - "epoch: 9, avg loss: 5.949830, avg accuracy: 0.090680\n", - "epoch: 10, avg loss: 5.933142, avg accuracy: 0.089480\n", - "epoch: 11, avg loss: 5.913042, avg accuracy: 0.090779\n", - "epoch: 12, avg loss: 5.910972, avg accuracy: 0.091650\n", - "epoch: 13, avg loss: 5.897120, avg accuracy: 0.095094\n", - "epoch: 14, avg loss: 5.865714, avg accuracy: 0.102273\n", - "epoch: 15, avg loss: 5.853282, avg accuracy: 0.101984\n", - "epoch: 16, avg loss: 5.828940, avg accuracy: 0.102335\n", - "epoch: 17, avg loss: 5.814208, avg accuracy: 0.101593\n", - "epoch: 18, avg loss: 5.786762, avg accuracy: 0.103044\n", - "epoch: 19, avg loss: 5.774316, avg accuracy: 0.109873\n", - "epoch: 20, avg loss: 5.728468, avg accuracy: 0.114907\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: we restarted from our strengths , and at the same time we restarted from his strengths .\n", - "REAL ANSWER: chúng tôi bắt đầu lại từ nghị lực , đồng thời , bắt đầu lại từ khả năng của bé .\n", - "PREDICTED ANSWER: tôi tôi , , chúng tôi tôi , và , , tôi chúng , , tôi . . , . . và . , . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . \n", - "\n", - "row 2\n", - "QUESTION: but we didn 't just fly it . we were flying at 100 meters above the top of the canopy to measure this molecule -- incredibly dangerous stuff .\n", - "REAL ANSWER: chúng tôi không chỉ bay . chúng tôi bay cách tầng vòm của rừng 100 mét để đo đạc phân tử này -- chuyện vô cùng nguy hiểm .\n", - "PREDICTED ANSWER: tôi tôi , tôi chúng tôi và , và , , tôi chúng . tôi tôi . . , . . chúng . , . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . \n", - "\n", - "row 3\n", - "QUESTION: so this stuff is just beginning .\n", - "REAL ANSWER: vậy đây chỉ mới là sự bắt đầu .\n", - "PREDICTED ANSWER: tôi tôi , tôi chúng tôi và , và . , tôi chúng . tôi và . . , . . chúng . , . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . \n", - "\n", - "row 4\n", - "QUESTION: when you have nanotechnology and nanoscience , what 's occurred is that we 're able to now look at atoms and molecules and actually control them for great benefits .\n", - "REAL ANSWER: khi chúng ta có kĩ thuật vi phân tử và khoa học vi phân tử chúng ta có thể nhìn thấy nguyên tử và phân tử và có thể điều khiển chúng để đem lại nhiều lợi ích\n", - "PREDICTED ANSWER: tôi tôi , tôi chúng , và , và . , tôi chúng . , và . . , . . chúng . , . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/47.attention-is-all-you-need-beam-search.ipynb b/neural-machine-translation/47.attention-is-all-you-need-beam-search.ipynb deleted file mode 100644 index 17983a1..0000000 --- a/neural-machine-translation/47.attention-is-all-you-need-beam-search.ipynb +++ /dev/null @@ -1,640 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - "\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " \n", - " outputs = gamma * normalized + beta\n", - " return outputs\n", - "\n", - "\n", - "def multihead_attn(queries, keys, q_masks, k_masks, future_binding, num_units, num_heads):\n", - " \n", - " T_q = tf.shape(queries)[1] \n", - " T_k = tf.shape(keys)[1] \n", - "\n", - " Q = tf.layers.dense(queries, num_units, name='Q') \n", - " K_V = tf.layers.dense(keys, 2*num_units, name='K_V') \n", - " K, V = tf.split(K_V, 2, -1) \n", - "\n", - " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), axis=0) \n", - " K_ = tf.concat(tf.split(K, num_heads, axis=2), axis=0) \n", - " V_ = tf.concat(tf.split(V, num_heads, axis=2), axis=0) \n", - "\n", - " align = tf.matmul(Q_, tf.transpose(K_, [0,2,1])) \n", - " align = align / np.sqrt(K_.get_shape().as_list()[-1]) \n", - "\n", - " paddings = tf.fill(tf.shape(align), float('-inf')) \n", - "\n", - " key_masks = k_masks \n", - " key_masks = tf.tile(key_masks, [num_heads, 1]) \n", - " key_masks = tf.tile(tf.expand_dims(key_masks, 1), [1, T_q, 1]) \n", - " align = tf.where(tf.equal(key_masks, 0), paddings, align) \n", - "\n", - " if future_binding:\n", - " lower_tri = tf.ones([T_q, T_k]) \n", - " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense() \n", - " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0], 1, 1]) \n", - " align = tf.where(tf.equal(masks, 0), paddings, align) \n", - " \n", - " align = tf.nn.softmax(align) \n", - " query_masks = tf.to_float(q_masks) \n", - " query_masks = tf.tile(query_masks, [num_heads, 1]) \n", - " query_masks = tf.tile(tf.expand_dims(query_masks, -1), [1, 1, T_k]) \n", - " align *= query_masks \n", - " \n", - " outputs = tf.matmul(align, V_) \n", - " outputs = tf.concat(tf.split(outputs, num_heads, axis=0), axis=2) \n", - " outputs += queries \n", - " outputs = layer_norm(outputs) \n", - " return outputs\n", - "\n", - "\n", - "def pointwise_feedforward(inputs, hidden_units, activation=None):\n", - " outputs = tf.layers.dense(inputs, 4*hidden_units, activation=activation)\n", - " outputs = tf.layers.dense(outputs, hidden_units, activation=None)\n", - " outputs += inputs\n", - " outputs = layer_norm(outputs)\n", - " return outputs\n", - "\n", - "\n", - "def learned_position_encoding(inputs, mask, embed_dim):\n", - " T = tf.shape(inputs)[1]\n", - " outputs = tf.range(tf.shape(inputs)[1]) # (T_q)\n", - " outputs = tf.expand_dims(outputs, 0) # (1, T_q)\n", - " outputs = tf.tile(outputs, [tf.shape(inputs)[0], 1]) # (N, T_q)\n", - " outputs = embed_seq(outputs, T, embed_dim, zero_pad=False, scale=False)\n", - " return tf.expand_dims(tf.to_float(mask), -1) * outputs\n", - "\n", - "\n", - "def sinusoidal_position_encoding(inputs, mask, repr_dim):\n", - " T = tf.shape(inputs)[1]\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1]) * tf.expand_dims(tf.to_float(mask), -1)\n", - "\n", - "\n", - "def label_smoothing(inputs, epsilon=0.1):\n", - " C = inputs.get_shape().as_list()[-1]\n", - " return ((1 - epsilon) * inputs) + (epsilon / C)\n", - "\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, embedded_size, from_dict_size, to_dict_size, learning_rate,\n", - " num_blocks = 2,\n", - " num_heads = 8,\n", - " min_freq = 50):\n", - " self.X = tf.placeholder(tf.int32,[None,None])\n", - " self.Y = tf.placeholder(tf.int32,[None,None])\n", - " \n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " \n", - " def forward(x, y):\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, x)\n", - " en_masks = tf.sign(x)\n", - " encoder_embedded += sinusoidal_position_encoding(x, en_masks, embedded_size)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('encoder_self_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " encoder_embedded = multihead_attn(queries = encoder_embedded,\n", - " keys = encoder_embedded,\n", - " q_masks = en_masks,\n", - " k_masks = en_masks,\n", - " future_binding = False,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - "\n", - " with tf.variable_scope('encoder_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " encoder_embedded = pointwise_feedforward(encoder_embedded,\n", - " embedded_size,\n", - " activation = tf.nn.relu)\n", - " \n", - " decoder_embedded = tf.nn.embedding_lookup(decoder_embedding, y)\n", - " de_masks = tf.sign(y)\n", - " decoder_embedded += sinusoidal_position_encoding(y, de_masks, embedded_size)\n", - " \n", - " for i in range(num_blocks):\n", - " with tf.variable_scope('decoder_self_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = decoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = de_masks,\n", - " future_binding = True,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('decoder_attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = multihead_attn(queries = decoder_embedded,\n", - " keys = encoder_embedded,\n", - " q_masks = de_masks,\n", - " k_masks = en_masks,\n", - " future_binding = False,\n", - " num_units = size_layer,\n", - " num_heads = num_heads)\n", - " \n", - " with tf.variable_scope('decoder_feedforward_%d'%i,reuse=tf.AUTO_REUSE):\n", - " decoder_embedded = pointwise_feedforward(decoder_embedded,\n", - " embedded_size,\n", - " activation = tf.nn.relu)\n", - " \n", - " return tf.layers.dense(decoder_embedded, to_dict_size, reuse=tf.AUTO_REUSE)\n", - " \n", - " self.training_logits = forward(self.X, decoder_input)\n", - " \n", - " def cond(i, y, temp):\n", - " return i < 2 * tf.reduce_max(self.X_seq_len)\n", - " \n", - " def body(i, y, temp):\n", - " logits = forward(self.X, y)\n", - " ids = tf.argmax(logits, -1)[:, i]\n", - " ids = tf.expand_dims(ids, -1)\n", - " temp = tf.concat([temp[:, 1:], ids], -1)\n", - " y = tf.concat([temp[:, -(i+1):], temp[:, :-(i+1)]], -1)\n", - " y = tf.reshape(y, [tf.shape(temp)[0], 2 * tf.reduce_max(self.X_seq_len)])\n", - " i += 1\n", - " return i, y, temp\n", - " \n", - " target = tf.fill([batch_size, 2 * tf.reduce_max(self.X_seq_len)], GO)\n", - " target = tf.cast(target, tf.int64)\n", - " self.target = target\n", - " \n", - " _, self.predicting_ids, _ = tf.while_loop(cond, body, \n", - " [tf.constant(0), target, target])\n", - " self.logits = forward(self.X, self.Y)\n", - " self.k = tf.placeholder(dtype = tf.int32)\n", - " p = tf.nn.softmax(self.logits)\n", - " self.topk_logprobs, self.topk_ids = tf.nn.top_k(tf.log(p), self.k)\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "embedded_size = 256\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(embedded_size, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.432765, avg accuracy: 0.063002\n", - "epoch: 2, avg loss: 6.099248, avg accuracy: 0.073339\n", - "epoch: 3, avg loss: 5.883634, avg accuracy: 0.104186\n", - "epoch: 4, avg loss: 5.683232, avg accuracy: 0.117697\n", - "epoch: 5, avg loss: 5.563599, avg accuracy: 0.122433\n", - "epoch: 6, avg loss: 5.496640, avg accuracy: 0.124055\n", - "epoch: 7, avg loss: 5.468002, avg accuracy: 0.124314\n", - "epoch: 8, avg loss: 5.427497, avg accuracy: 0.125657\n", - "epoch: 9, avg loss: 5.485522, avg accuracy: 0.108804\n", - "epoch: 10, avg loss: 5.337589, avg accuracy: 0.124008\n", - "epoch: 11, avg loss: 5.198288, avg accuracy: 0.138992\n", - "epoch: 12, avg loss: 5.080966, avg accuracy: 0.140140\n", - "epoch: 13, avg loss: 5.101196, avg accuracy: 0.137765\n", - "epoch: 14, avg loss: 5.125067, avg accuracy: 0.147141\n", - "epoch: 15, avg loss: 5.043302, avg accuracy: 0.147371\n", - "epoch: 16, avg loss: 4.968937, avg accuracy: 0.143341\n", - "epoch: 17, avg loss: 4.993334, avg accuracy: 0.147687\n", - "epoch: 18, avg loss: 4.935769, avg accuracy: 0.153807\n", - "epoch: 19, avg loss: 4.916368, avg accuracy: 0.149864\n", - "epoch: 20, avg loss: 4.897545, avg accuracy: 0.158205\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "class Hypothesis:\n", - " def __init__(self, log_prob, seq):\n", - " self.log_prob = log_prob\n", - " self.seq = seq\n", - "\n", - " @property\n", - " def step(self):\n", - " return len(self.seq) - 1\n", - "\n", - "\n", - "def beam_search(\n", - " batch_x,\n", - " beam_size,\n", - " num_ans = 5,\n", - " normalize_by_len = 1.0,\n", - "):\n", - " assert 0 <= normalize_by_len <= 1\n", - " batch_size = len(batch_x)\n", - " max_len = len(batch_x[0]) * 2\n", - " dec_inputs = np.ones((batch_size, 2), dtype=np.int32)\n", - " answers = [[] for i in range(batch_size)]\n", - " H = [[] for i in range(batch_size)]\n", - " \n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " for i in range(batch_size):\n", - " for j, log_prob in enumerate(tkl[i, 0]):\n", - " if tkid[i, 0, j] != EOS:\n", - " h = Hypothesis(log_prob, [1, tkid[i, 0, j]])\n", - " H[i].append(h)\n", - " H[i].sort(key=lambda h: h.log_prob)\n", - " \n", - " done = [False] * batch_size\n", - " while not all(done):\n", - " tkl_beam = []\n", - " tkid_beam = []\n", - " dec_inputs_beam = []\n", - " steps_beam = []\n", - " for i in range(beam_size):\n", - " steps = [1] * batch_size\n", - " prev_log_probs = np.zeros(batch_size, dtype=np.float32)\n", - " dec_inputs = np.ones((batch_size, max_len), dtype=np.int32)\n", - " for j, h in enumerate(H):\n", - " while h:\n", - " hi = h.pop()\n", - " lp, step, candidate_seq = hi.log_prob, hi.step, hi.seq\n", - " if candidate_seq[-1] != EOS:\n", - " dec_inputs[j, :len(candidate_seq)] = candidate_seq\n", - " steps[j] = step\n", - " prev_log_probs[j] = lp\n", - " break\n", - " else:\n", - " answers[j].append((lp, candidate_seq))\n", - " max_step = max(steps)\n", - " dec_inputs = dec_inputs[:, :max_step + 2]\n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " \n", - " tkl_beam.append(tkl + prev_log_probs[:, None, None])\n", - " tkid_beam.append(tkid)\n", - " dec_inputs_beam.append(dec_inputs.copy())\n", - " steps_beam.append(steps)\n", - " \n", - " for i in range(beam_size):\n", - " tkl = tkl_beam[i]\n", - " tkid = tkid_beam[i]\n", - " dec_inputs = dec_inputs_beam[i]\n", - " steps = steps_beam[i]\n", - " for j in range(batch_size):\n", - " step = steps[j]\n", - " for k in range(tkid.shape[2]):\n", - " extended_seq = np.hstack((dec_inputs[j, :step+1], [tkid[j, step, k]]))\n", - " log_prob = tkl[j, step, k]\n", - " if len(extended_seq) <= max_len and log_prob > -10:\n", - " h = Hypothesis(log_prob, extended_seq)\n", - " H[j].append(h)\n", - " H[j].sort(key=lambda h: h.log_prob / (h.step**normalize_by_len))\n", - " \n", - " for i in range(batch_size):\n", - " done[i] = (len(answers[i]) >= num_ans) or (not H[i]) or (len(H[i]) > 100)\n", - " \n", - " return answers " - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "beamed = beam_search(batch_x, 5)" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[]" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "beamed[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "ename": "ValueError", - "evalue": "max() arg is an empty sequence", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpredicted\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mb\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mbeamed\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mpredicted\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mlambda\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mb\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mbeamed\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;31mValueError\u001b[0m: max() arg is an empty sequence" - ] - } - ], - "source": [ - "predicted = [max(b, key = lambda t: t[0])[1] for b in beamed]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/47.transformer-encoder-transformer-decoder.ipynb b/neural-machine-translation/47.transformer-encoder-transformer-decoder.ipynb new file mode 100644 index 0000000..aeba950 --- /dev/null +++ b/neural-machine-translation/47.transformer-encoder-transformer-decoder.ipynb @@ -0,0 +1,853 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [EOS] for i in train_Y]\n", + "test_Y = [i + [EOS] for i in test_Y]\n", + "train_X = [i + [EOS] for i in train_X]\n", + "test_X = [i + [EOS] for i in test_X]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "\n", + "BASE_PARAMS = defaultdict(\n", + " lambda: None, # Set default value to None.\n", + "\n", + " # Input params\n", + " default_batch_size=2048, # Maximum number of tokens per batch of examples.\n", + " default_batch_size_tpu=32768,\n", + " max_length=256, # Maximum number of tokens per example.\n", + "\n", + " # Model params\n", + " initializer_gain=1.0, # Used in trainable variable initialization.\n", + " vocab_size=vocab_size, # Number of tokens defined in the vocabulary file.\n", + " hidden_size=512, # Model dimension in the hidden layers.\n", + " num_hidden_layers=6, # Number of layers in the encoder and decoder stacks.\n", + " num_heads=8, # Number of heads to use in multi-headed attention.\n", + " filter_size=2048, # Inner layer dimension in the feedforward network.\n", + "\n", + " # Dropout values (only used when training)\n", + " layer_postprocess_dropout=0.1,\n", + " attention_dropout=0.1,\n", + " relu_dropout=0.1,\n", + "\n", + " # Training params\n", + " label_smoothing=0.1,\n", + " learning_rate=1.0,\n", + " learning_rate_decay_rate=1.0,\n", + " learning_rate_warmup_steps=16000,\n", + "\n", + " # Optimizer params\n", + " optimizer_adam_beta1=0.9,\n", + " optimizer_adam_beta2=0.997,\n", + " optimizer_adam_epsilon=1e-09,\n", + "\n", + " # Default prediction params\n", + " extra_decode_length=50,\n", + " beam_size=4,\n", + " alpha=0.6, # used to calculate length normalization in beam search\n", + "\n", + " # TPU specific parameters\n", + " use_tpu=False,\n", + " static_batch=False,\n", + " allow_ffn_pad=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:24: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.\n", + "\n" + ] + } + ], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "from transformer.transformer import Transformer\n", + "from transformer import utils\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, train = True, learning_rate = 1e-4):\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " model = Transformer(BASE_PARAMS, train)\n", + " self.training_logits = model(self.X, self.Y)\n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + "# self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + "# targets = self.Y,\n", + "# weights = masks)\n", + " \n", + " xentropy, weights = utils.padded_cross_entropy_loss(\n", + " self.training_logits, self.Y, BASE_PARAMS[\"label_smoothing\"], BASE_PARAMS[\"vocab_size\"])\n", + " self.cost = tf.reduce_sum(xentropy) / tf.reduce_sum(weights)\n", + " self.xentropy = xentropy\n", + " self.weights = weights\n", + " \n", + "# optimizer = tf.contrib.opt.LazyAdamOptimizer(\n", + "# BASE_PARAMS['learning_rate'],\n", + "# beta1=BASE_PARAMS[\"optimizer_adam_beta1\"],\n", + "# beta2=BASE_PARAMS[\"optimizer_adam_beta2\"],\n", + "# epsilon=BASE_PARAMS[\"optimizer_adam_epsilon\"])\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate,\n", + " beta1=BASE_PARAMS[\"optimizer_adam_beta1\"],\n", + " beta2=BASE_PARAMS[\"optimizer_adam_beta2\"],\n", + " epsilon=BASE_PARAMS[\"optimizer_adam_epsilon\"]).minimize(self.cost)\n", + " \n", + "# global_step = tf.train.get_global_step()\n", + "# tvars = tf.trainable_variables()\n", + "# gradients = optimizer.compute_gradients(\n", + "# self.cost, tvars, colocate_gradients_with_ops=True)\n", + "# minimize_op = optimizer.apply_gradients(\n", + "# gradients, global_step=global_step, name=\"train\")\n", + "# update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n", + "# self.optimizer = tf.group(minimize_op, update_ops)\n", + "\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " self.fast_result = model(self.X)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:39: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/transformer.py:84: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:89: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:48: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:51: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/ffn_layer.py:65: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/ffn_layer.py:65: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:71: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/utils.py:82: The name tf.log is deprecated. Please use tf.math.log instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/beam_search.py:420: calling reduce_logsumexp_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "keep_dims is deprecated, use keepdims instead\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(train = False)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'outputs': array([[ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821,\n", + " 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821,\n", + " 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821,\n", + " 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821, 29821,\n", + " 29821, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 5789, 5789, 5789, 5789,\n", + " 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789,\n", + " 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789,\n", + " 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789,\n", + " 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789, 5789,\n", + " 5789, 5789, 5789, 5789, 5789, 5789],\n", + " [24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113, 24113,\n", + " 24113, 28542, 28542, 28542, 28542, 28542, 28542, 28542, 28542,\n", + " 28542, 28542, 28542, 28542, 28542, 28542, 28542, 28542, 28542,\n", + " 28542, 28542, 28542, 28542, 28542, 28542],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633, 12633,\n", + " 12633, 12633, 12633, 12633, 12633, 12633],\n", + " [26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920, 26920,\n", + " 26920, 26920, 26920, 26920, 26920, 26920],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971, 31971,\n", + " 31971, 31971, 31971, 31971, 31971, 31971],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 9038, 9038, 9038, 9038, 9038, 9038, 9038,\n", + " 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038,\n", + " 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038,\n", + " 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038, 9038,\n", + " 9038, 9038, 9038, 9038, 9038, 9038],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679],\n", + " [20256, 20256, 20256, 20256, 20256, 20256, 20256, 20256, 20256,\n", + " 20256, 20256, 20256, 20256, 20256, 20256, 20256, 20256, 20256,\n", + " 20256, 20256, 20256, 20256, 20256, 20256, 20256, 20256, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702, 11702,\n", + " 11702, 11702, 11702, 11702, 11702, 11702],\n", + " [ 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843, 7843,\n", + " 7843, 7843, 7843, 7843, 7843, 7843, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679, 16679,\n", + " 16679, 16679, 16679, 16679, 16679, 16679]], dtype=int32),\n", + " 'scores': array([-168.1183 , -127.506546, -139.3419 , -137.75621 , -117.58283 ,\n", + " -146.28534 , -137.3829 , -142.91571 , -138.62807 , -139.07626 ],\n", + " dtype=float32)},\n", + " 9.59,\n", + " 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [09:59<00:00, 2.61it/s, accuracy=0.311, cost=4.12] \n", + "minibatch loop: 100%|██████████| 40/40 [00:08<00:00, 4.85it/s, accuracy=0.355, cost=3.69]\n", + "minibatch loop: 0%| | 0/1563 [00:00 1])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 1])" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.17100729" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/48.conv-encoder-conv-decoder.ipynb b/neural-machine-translation/48.conv-encoder-conv-decoder.ipynb deleted file mode 100644 index df95cd2..0000000 --- a/neural-machine-translation/48.conv-encoder-conv-decoder.ipynb +++ /dev/null @@ -1,531 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "emb_size = 256\n", - "n_hidden = 256\n", - "n_layers = 4\n", - "n_attn_heads = 16\n", - "learning_rate = 1e-4\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def encoder_block(inp, n_hidden, filter_size):\n", - " inp = tf.expand_dims(inp, 2)\n", - " inp = tf.pad(inp, [[0, 0], [(filter_size[0]-1)//2, (filter_size[0]-1)//2], [0, 0], [0, 0]])\n", - " conv = tf.layers.conv2d(inp, n_hidden, filter_size, padding=\"VALID\", activation=None)\n", - " conv = tf.squeeze(conv, 2)\n", - " return conv\n", - "\n", - "def decoder_block(inp, n_hidden, filter_size):\n", - " inp = tf.expand_dims(inp, 2)\n", - " inp = tf.pad(inp, [[0, 0], [filter_size[0]-1, 0], [0, 0], [0, 0]])\n", - " conv = tf.layers.conv2d(inp, n_hidden, filter_size, padding=\"VALID\", activation=None)\n", - " conv = tf.squeeze(conv, 2)\n", - " return conv\n", - "\n", - "def glu(x):\n", - " return tf.multiply(x[:, :, :tf.shape(x)[2]//2], tf.sigmoid(x[:, :, tf.shape(x)[2]//2:]))\n", - "\n", - "def layer(inp, conv_block, kernel_width, n_hidden, residual=None):\n", - " z = conv_block(inp, n_hidden, (kernel_width, 1))\n", - " return glu(z) + (residual if residual is not None else 0)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self):\n", - "\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - "\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([len(dictionary_from), emb_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([len(dictionary_to), emb_size], -1, 1))\n", - " \n", - " def forward(x, y,reuse=False):\n", - " with tf.variable_scope('forward',reuse=reuse):\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, x)\n", - " decoder_embedded = tf.nn.embedding_lookup(decoder_embedding, y)\n", - "\n", - " e = tf.identity(encoder_embedded)\n", - "\n", - " for i in range(n_layers):\n", - " z = layer(encoder_embedded, encoder_block, 3, n_hidden * 2, encoder_embedded)\n", - " encoder_embedded = z\n", - "\n", - " encoder_output, output_memory = z, z + e\n", - " g = tf.identity(decoder_embedded)\n", - "\n", - " for i in range(n_layers):\n", - " attn_res = h = layer(decoder_embedded, decoder_block, 3, n_hidden * 2, \n", - " residual=tf.zeros_like(decoder_embedded))\n", - " C = []\n", - " for j in range(n_attn_heads):\n", - " h_ = tf.layers.dense(h, n_hidden//n_attn_heads)\n", - " g_ = tf.layers.dense(g, n_hidden//n_attn_heads)\n", - " zu_ = tf.layers.dense(encoder_output, n_hidden//n_attn_heads)\n", - " ze_ = tf.layers.dense(output_memory, n_hidden//n_attn_heads)\n", - "\n", - " d = tf.layers.dense(h_, n_hidden//n_attn_heads) + g_\n", - " dz = tf.matmul(d, tf.transpose(zu_, [0, 2, 1]))\n", - " a = tf.nn.softmax(dz)\n", - " c_ = tf.matmul(a, ze_)\n", - " C.append(c_)\n", - "\n", - " c = tf.concat(C, 2)\n", - " h = tf.layers.dense(attn_res + c, n_hidden)\n", - " decoder_embedded = h\n", - "\n", - " decoder_output = tf.sigmoid(h)\n", - " return tf.layers.dense(decoder_output, len(dictionary_to))\n", - " self.training_logits = forward(self.X, decoder_input)\n", - " self.logits = forward(self.X, self.Y, reuse=True)\n", - " self.k = tf.placeholder(dtype = tf.int32)\n", - " p = tf.nn.softmax(self.logits)\n", - " self.topk_logprobs, self.topk_ids = tf.nn.top_k(tf.log(p), self.k)\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot()\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 7.362195, avg accuracy: 0.002599\n", - "epoch: 2, avg loss: 6.876383, avg accuracy: 0.035388\n", - "epoch: 3, avg loss: 6.452302, avg accuracy: 0.040414\n", - "epoch: 4, avg loss: 6.195236, avg accuracy: 0.050942\n", - "epoch: 5, avg loss: 6.053087, avg accuracy: 0.079203\n", - "epoch: 6, avg loss: 5.970512, avg accuracy: 0.090725\n", - "epoch: 7, avg loss: 5.907717, avg accuracy: 0.099582\n", - "epoch: 8, avg loss: 5.851829, avg accuracy: 0.107494\n", - "epoch: 9, avg loss: 5.796880, avg accuracy: 0.113903\n", - "epoch: 10, avg loss: 5.744820, avg accuracy: 0.124458\n", - "epoch: 11, avg loss: 5.684102, avg accuracy: 0.128438\n", - "epoch: 12, avg loss: 5.622067, avg accuracy: 0.133466\n", - "epoch: 13, avg loss: 5.562336, avg accuracy: 0.139069\n", - "epoch: 14, avg loss: 5.497786, avg accuracy: 0.142818\n", - "epoch: 15, avg loss: 5.434413, avg accuracy: 0.149519\n", - "epoch: 16, avg loss: 5.365533, avg accuracy: 0.154012\n", - "epoch: 17, avg loss: 5.300494, avg accuracy: 0.159030\n", - "epoch: 18, avg loss: 5.232462, avg accuracy: 0.165978\n", - "epoch: 19, avg loss: 5.165458, avg accuracy: 0.172521\n", - "epoch: 20, avg loss: 5.096660, avg accuracy: 0.177135\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " accuracy,loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "class Hypothesis:\n", - " def __init__(self, log_prob, seq):\n", - " self.log_prob = log_prob\n", - " self.seq = seq\n", - "\n", - " @property\n", - " def step(self):\n", - " return len(self.seq) - 1\n", - "\n", - "\n", - "def beam_search(\n", - " batch_x,\n", - " beam_size,\n", - " num_ans = 5,\n", - " normalize_by_len = 1.0,\n", - "):\n", - " assert 0 <= normalize_by_len <= 1\n", - " batch_size = len(batch_x)\n", - " max_len = len(batch_x[0]) * 2\n", - " dec_inputs = np.ones((batch_size, 2), dtype=np.int32)\n", - " answers = [[] for i in range(batch_size)]\n", - " H = [[] for i in range(batch_size)]\n", - " \n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " for i in range(batch_size):\n", - " for j, log_prob in enumerate(tkl[i, 0]):\n", - " if tkid[i, 0, j] != EOS:\n", - " h = Hypothesis(log_prob, [1, tkid[i, 0, j]])\n", - " H[i].append(h)\n", - " H[i].sort(key=lambda h: h.log_prob)\n", - " \n", - " done = [False] * batch_size\n", - " while not all(done):\n", - " tkl_beam = []\n", - " tkid_beam = []\n", - " dec_inputs_beam = []\n", - " steps_beam = []\n", - " for i in range(beam_size):\n", - " steps = [1] * batch_size\n", - " prev_log_probs = np.zeros(batch_size, dtype=np.float32)\n", - " dec_inputs = np.ones((batch_size, max_len), dtype=np.int32)\n", - " for j, h in enumerate(H):\n", - " while h:\n", - " hi = h.pop()\n", - " lp, step, candidate_seq = hi.log_prob, hi.step, hi.seq\n", - " if candidate_seq[-1] != EOS:\n", - " dec_inputs[j, :len(candidate_seq)] = candidate_seq\n", - " steps[j] = step\n", - " prev_log_probs[j] = lp\n", - " break\n", - " else:\n", - " answers[j].append((lp, candidate_seq))\n", - " max_step = max(steps)\n", - " dec_inputs = dec_inputs[:, :max_step + 2]\n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " \n", - " tkl_beam.append(tkl + prev_log_probs[:, None, None])\n", - " tkid_beam.append(tkid)\n", - " dec_inputs_beam.append(dec_inputs.copy())\n", - " steps_beam.append(steps)\n", - " \n", - " for i in range(beam_size):\n", - " tkl = tkl_beam[i]\n", - " tkid = tkid_beam[i]\n", - " dec_inputs = dec_inputs_beam[i]\n", - " steps = steps_beam[i]\n", - " for j in range(batch_size):\n", - " step = steps[j]\n", - " for k in range(tkid.shape[2]):\n", - " extended_seq = np.hstack((dec_inputs[j, :step+1], [tkid[j, step, k]]))\n", - " log_prob = tkl[j, step, k]\n", - " if len(extended_seq) <= max_len and log_prob > -10:\n", - " h = Hypothesis(log_prob, extended_seq)\n", - " H[j].append(h)\n", - " H[j].sort(key=lambda h: h.log_prob / (h.step**normalize_by_len))\n", - " \n", - " for i in range(batch_size):\n", - " done[i] = (len(answers[i]) >= num_ans) or (not H[i]) or (len(H[i]) > 100)\n", - " \n", - " return answers " - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "beamed = beam_search(batch_x, 5)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "beamed = [i for i in beamed if len(i)]" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "predicted = [max(b, key = lambda t: t[0])[1] for b in beamed]" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: tôi . \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: tôi \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(predicted)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/48.transformer-encoder-lstm-decoder-greedy.ipynb b/neural-machine-translation/48.transformer-encoder-lstm-decoder-greedy.ipynb new file mode 100644 index 0000000..d49dcaf --- /dev/null +++ b/neural-machine-translation/48.transformer-encoder-lstm-decoder-greedy.ipynb @@ -0,0 +1,949 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from transformer import utils" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "\n", + "BASE_PARAMS = defaultdict(\n", + " lambda: None, # Set default value to None.\n", + "\n", + " # Input params\n", + " default_batch_size=2048, # Maximum number of tokens per batch of examples.\n", + " default_batch_size_tpu=32768,\n", + " max_length=256, # Maximum number of tokens per example.\n", + "\n", + " # Model params\n", + " initializer_gain=1.0, # Used in trainable variable initialization.\n", + " vocab_size=vocab_size, # Number of tokens defined in the vocabulary file.\n", + " hidden_size=512, # Model dimension in the hidden layers.\n", + " num_hidden_layers=6, # Number of layers in the encoder and decoder stacks.\n", + " num_heads=8, # Number of heads to use in multi-headed attention.\n", + " filter_size=2048, # Inner layer dimension in the feedforward network.\n", + "\n", + " # Dropout values (only used when training)\n", + " layer_postprocess_dropout=0.1,\n", + " attention_dropout=0.1,\n", + " relu_dropout=0.1,\n", + "\n", + " # Training params\n", + " label_smoothing=0.1,\n", + " learning_rate=2.0,\n", + " learning_rate_decay_rate=1.0,\n", + " learning_rate_warmup_steps=16000,\n", + "\n", + " # Optimizer params\n", + " optimizer_adam_beta1=0.9,\n", + " optimizer_adam_beta2=0.997,\n", + " optimizer_adam_epsilon=1e-09,\n", + "\n", + " # Default prediction params\n", + " extra_decode_length=50,\n", + " beam_size=4,\n", + " alpha=0.6, # used to calculate length normalization in beam search\n", + "\n", + " # TPU specific parameters\n", + " use_tpu=False,\n", + " static_batch=False,\n", + " allow_ffn_pad=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:24: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.\n", + "\n" + ] + } + ], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "from transformer import embedding_layer\n", + "from transformer.transformer import EncoderStack\n", + "from transformer import model_utils\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, num_layers, train = True, learning_rate = 1e-4):\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " self.embedding_softmax_layer = embedding_layer.EmbeddingSharedWeights(\n", + " BASE_PARAMS[\"vocab_size\"], BASE_PARAMS[\"hidden_size\"],\n", + " method=\"gather\")\n", + " self.encoder_stack = EncoderStack(BASE_PARAMS, train)\n", + " with tf.name_scope(\"encode\"):\n", + " # Prepare inputs to the layer stack by adding positional encodings and\n", + " # applying dropout.\n", + " embedded_inputs = self.embedding_softmax_layer(self.X)\n", + " inputs_padding = model_utils.get_padding(self.X)\n", + " attention_bias = model_utils.get_padding_bias(self.X)\n", + "\n", + " with tf.name_scope(\"add_pos_encoding\"):\n", + " length = tf.shape(embedded_inputs)[1]\n", + " pos_encoding = model_utils.get_position_encoding(\n", + " length, BASE_PARAMS[\"hidden_size\"])\n", + " encoder_inputs = embedded_inputs + pos_encoding\n", + "\n", + " if train:\n", + " encoder_inputs = tf.nn.dropout(\n", + " encoder_inputs, 1 - BASE_PARAMS[\"layer_postprocess_dropout\"])\n", + "\n", + " self.encoded = self.encoder_stack(encoder_inputs, attention_bias, inputs_padding)\n", + " print(self.encoded)\n", + " \n", + " first_token_tensor = tf.squeeze(\n", + " self.encoded[:, 0:1, :], axis = 1\n", + " )\n", + " c = tf.layers.dense(\n", + " first_token_tensor,\n", + " BASE_PARAMS[\"hidden_size\"],\n", + " activation = tf.tanh,\n", + " )\n", + " h = tf.layers.dense(\n", + " first_token_tensor,\n", + " BASE_PARAMS[\"hidden_size\"],\n", + " activation = tf.tanh,\n", + " )\n", + "\n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(BASE_PARAMS[\"hidden_size\"],initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=c, h=h)\n", + " \n", + " encoder_state = tuple([lstm_state] * num_layers)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " \n", + " embedding = self.embedding_softmax_layer.shared_weights\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embedding, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embedding,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " \n", + " xentropy, weights = utils.padded_cross_entropy_loss(\n", + " self.training_logits, self.Y, BASE_PARAMS[\"label_smoothing\"], BASE_PARAMS[\"vocab_size\"])\n", + " self.cost = tf.reduce_sum(xentropy) / tf.reduce_sum(weights)\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:39: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:48: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:48: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:51: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:70: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From :39: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/ffn_layer.py:65: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/ffn_layer.py:65: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "Tensor(\"encode/encoder_stack/layer_normalization/add_1:0\", shape=(?, ?, 512), dtype=float32)\n", + "WARNING:tensorflow:From :50: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From :59: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :64: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/utils.py:82: The name tf.log is deprecated. Please use tf.math.log instead.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(num_layers = 2)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 1584, 8487, 8487, 8487, 8487, 8487, 21053, 31856, 31856,\n", + " 31856, 31856, 31856, 31856, 31856, 31856, 31856, 9791, 9791,\n", + " 9791, 9791, 9791, 9791, 2356, 13770, 13770, 13770, 13770,\n", + " 13770, 13770, 3685, 3685, 3685, 1210, 1210, 25848, 25848,\n", + " 25848, 5243, 5243, 5243, 3273, 11782, 11782, 11782, 10310,\n", + " 10310, 23115, 23115, 23115, 12151, 12151, 12151, 12151, 12151,\n", + " 11810, 3376, 7333, 7333, 7333, 14306, 14306, 14306, 14306,\n", + " 22549, 22549, 22549, 6729, 16134, 16134, 16134, 16134, 3107],\n", + " [30245, 30245, 30245, 19217, 12923, 12923, 12923, 12923, 14609,\n", + " 14609, 6175, 6175, 6175, 6175, 6175, 6175, 6175, 6175,\n", + " 7999, 10259, 10259, 10259, 10259, 10259, 10259, 27756, 6938,\n", + " 6938, 6938, 6938, 6938, 17095, 17095, 17095, 29059, 29059,\n", + " 13342, 12696, 12696, 21499, 7348, 7348, 7348, 11768, 11768,\n", + " 11768, 11768, 28451, 28451, 20921, 20921, 13516, 13516, 5371,\n", + " 5371, 5371, 5371, 5371, 7513, 19647, 6604, 19647, 20287,\n", + " 20287, 20287, 20287, 20287, 914, 914, 914, 6653, 6653],\n", + " [ 4258, 4258, 16237, 29131, 29131, 29131, 29131, 29131, 29131,\n", + " 1149, 1149, 1149, 1149, 23958, 23958, 23958, 23958, 23958,\n", + " 23958, 23958, 23958, 23958, 15456, 15456, 15456, 7152, 7152,\n", + " 26405, 26405, 26405, 4060, 8118, 8118, 24555, 24555, 16495,\n", + " 16495, 11441, 33, 16915, 24312, 24312, 2105, 2105, 14821,\n", + " 14821, 23826, 23826, 24312, 24312, 8306, 8306, 8306, 8306,\n", + " 10030, 2392, 2392, 19341, 19341, 4663, 4663, 4663, 4663,\n", + " 4663, 4663, 4663, 9106, 30281, 18878, 18878, 23308, 23308],\n", + " [27249, 18266, 18266, 18266, 18266, 20455, 20455, 20455, 20455,\n", + " 20455, 6455, 25248, 25248, 25248, 25248, 25248, 25248, 25248,\n", + " 25248, 25248, 25248, 19283, 19283, 30171, 27579, 27579, 27579,\n", + " 30171, 23245, 27754, 27754, 5273, 5273, 5273, 23479, 19563,\n", + " 7740, 7740, 23479, 2790, 23521, 23521, 13976, 13976, 29844,\n", + " 29844, 29844, 12102, 12102, 12102, 12072, 28510, 6810, 9048,\n", + " 9048, 9048, 9048, 20050, 16995, 16995, 19365, 19365, 19365,\n", + " 19365, 19365, 19365, 8495, 2949, 23197, 23197, 23197, 23197],\n", + " [26977, 26977, 26977, 27925, 27925, 27925, 27925, 27925, 25367,\n", + " 25367, 25367, 25367, 25367, 25367, 6083, 6083, 6083, 6083,\n", + " 27708, 8580, 8580, 8580, 15456, 15456, 15456, 9243, 9243,\n", + " 9243, 24813, 24813, 29594, 6149, 13929, 19567, 19567, 19567,\n", + " 5595, 5595, 5595, 5595, 5595, 8715, 6287, 3730, 14061,\n", + " 1626, 1626, 1626, 21497, 21497, 21497, 20521, 3839, 27762,\n", + " 27762, 21263, 21263, 21263, 12969, 13054, 13054, 13054, 31077,\n", + " 31077, 31077, 15947, 15947, 31077, 15052, 15052, 15947, 15947],\n", + " [ 3632, 4258, 3594, 14878, 14878, 14878, 14878, 14878, 3784,\n", + " 3784, 8359, 8359, 8359, 17454, 17454, 14987, 14987, 14987,\n", + " 14987, 14987, 14987, 14987, 14987, 14987, 14987, 14987, 14987,\n", + " 14987, 14987, 14987, 14987, 4538, 27189, 27189, 2660, 14337,\n", + " 14337, 14337, 13444, 19890, 19890, 19890, 19890, 9642, 6193,\n", + " 6193, 6193, 3441, 3909, 28706, 15900, 15900, 31597, 31597,\n", + " 31597, 27788, 27788, 3309, 13110, 13110, 23568, 757, 30792,\n", + " 30792, 23568, 13075, 13075, 13075, 28755, 28755, 28755, 30949],\n", + " [ 6019, 7271, 7271, 7638, 16890, 16890, 16890, 9814, 17932,\n", + " 17932, 17932, 25186, 25186, 25186, 25186, 13216, 13216, 20984,\n", + " 20984, 20984, 20984, 24157, 24157, 24157, 28151, 28151, 10582,\n", + " 15978, 15978, 15978, 15978, 15978, 4937, 4937, 15088, 20678,\n", + " 7801, 7801, 20678, 4012, 4012, 4012, 9844, 31011, 31011,\n", + " 31011, 31011, 16000, 16000, 9347, 29847, 29847, 17779, 4612,\n", + " 4612, 4612, 11597, 11597, 28698, 28698, 28698, 5712, 14933,\n", + " 14933, 14933, 20523, 15927, 15927, 15927, 27889, 27889, 19547],\n", + " [10237, 10237, 8219, 28485, 28485, 28485, 28485, 28485, 28485,\n", + " 28485, 28485, 28485, 11708, 11708, 11708, 11708, 11708, 9938,\n", + " 9938, 9938, 9938, 9938, 22653, 22653, 16327, 16327, 16327,\n", + " 16582, 16582, 29607, 29607, 29607, 19236, 9760, 9760, 9760,\n", + " 29167, 29167, 29167, 26223, 26223, 26223, 26611, 26223, 26223,\n", + " 14544, 25195, 25195, 25195, 9261, 6791, 6791, 15413, 15413,\n", + " 15413, 15413, 1189, 16323, 16323, 15107, 15107, 15107, 9947,\n", + " 9947, 6063, 17819, 30453, 30453, 4128, 29946, 11952, 11952],\n", + " [28869, 28869, 18429, 8282, 8282, 8282, 8282, 8282, 3378,\n", + " 3378, 3378, 3378, 3378, 3378, 4463, 4463, 4463, 4463,\n", + " 24465, 24158, 24158, 24158, 25390, 9718, 9718, 9718, 25718,\n", + " 25718, 25718, 14504, 21552, 21552, 21552, 20609, 20609, 5059,\n", + " 5059, 18574, 18574, 18574, 26819, 31704, 6777, 6777, 6777,\n", + " 6777, 21266, 21266, 21266, 21266, 21266, 17566, 17566, 16164,\n", + " 16164, 12992, 12992, 3800, 22274, 22274, 22274, 22055, 22055,\n", + " 22055, 30729, 30729, 23667, 4393, 4393, 4393, 4393, 4393],\n", + " [14912, 1172, 1172, 1172, 1172, 9983, 29068, 29068, 29068,\n", + " 29068, 29068, 29068, 29068, 29068, 29068, 20722, 20722, 24901,\n", + " 24901, 24901, 24901, 24901, 24901, 24901, 18260, 18260, 18260,\n", + " 18260, 18260, 23992, 25418, 25418, 25418, 4895, 20353, 20353,\n", + " 20353, 20353, 26525, 26525, 26525, 26525, 5908, 11914, 11914,\n", + " 11739, 11739, 11739, 30829, 30829, 6703, 7126, 10264, 10264,\n", + " 10264, 18178, 18178, 18178, 18178, 18178, 30470, 30470, 22539,\n", + " 23507, 23507, 5938, 5938, 5938, 6034, 5546, 24574, 24574]],\n", + " dtype=int32), 9.010772, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:17<00:00, 2.53it/s, accuracy=0.103, cost=5.99] \n", + "minibatch loop: 100%|██████████| 40/40 [00:07<00:00, 5.42it/s, accuracy=0.129, cost=5.64] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0.049064703" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/49.bertmultilanguage-encoder-bertmultilanguage-decoder.ipynb b/neural-machine-translation/49.bertmultilanguage-encoder-bertmultilanguage-decoder.ipynb new file mode 100644 index 0000000..422c2f5 --- /dev/null +++ b/neural-machine-translation/49.bertmultilanguage-encoder-bertmultilanguage-decoder.ipynb @@ -0,0 +1,1003 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip\n", + "# !unzip multi_cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "BERT_VOCAB = 'multi_cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'multi_cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'multi_cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "GO = 101\n", + "EOS = 102" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from unidecode import unidecode\n", + "\n", + "def get_inputs(x, y):\n", + " input_ids, input_masks, segment_ids, ys = [], [], [], []\n", + " for i in tqdm(range(len(x))):\n", + " tokens_a = tokenizer.tokenize(unidecode(x[i]))\n", + " tokens_b = tokenizer.tokenize(unidecode(y[i]))\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " \n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " \n", + " r = tokenizer.convert_tokens_to_ids(tokens_b + [\"[SEP]\"])\n", + " if len([k for k in r if k == 0]):\n", + " print(y[i], i)\n", + " break\n", + " \n", + " ys.append(r)\n", + " \n", + " return input_ids, input_masks, segment_ids, ys" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 200000/200000 [02:25<00:00, 1372.95it/s]\n" + ] + } + ], + "source": [ + "train_input_ids, train_input_masks, train_segment_ids, train_Y = get_inputs(train_X, train_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 5000/5000 [00:03<00:00, 1448.79it/s]\n" + ] + } + ], + "source": [ + "test_input_ids, test_input_masks, test_segment_ids, test_Y = get_inputs(test_X, test_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "epoch = 20\n", + "batch_size = 16\n", + "warmup_proportion = 0.1\n", + "num_train_steps = len(train_input_ids)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "import bert_decoder as modeling_decoder\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " learning_rate = 2e-5,\n", + " training = True,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " def forward(x, segment, masks, y, reuse = False, config = bert_config):\n", + " \n", + " with tf.variable_scope('bert',reuse=reuse):\n", + " \n", + " model = modeling.BertModel(\n", + " config=config,\n", + " is_training=training,\n", + " input_ids=x,\n", + " input_mask=masks,\n", + " token_type_ids=segment,\n", + " use_one_hot_embeddings=False)\n", + " memory = model.get_sequence_output()\n", + " \n", + " with tf.variable_scope('bert',reuse=True):\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype=tf.int32)\n", + " y_masks = tf.sequence_mask(Y_seq_len, tf.reduce_max(Y_seq_len), dtype=tf.float32)\n", + " \n", + " model = modeling_decoder.BertModel(\n", + " config=config,\n", + " is_training=training,\n", + " input_ids=y,\n", + " input_mask=y_masks,\n", + " memory = memory,\n", + " memory_mask = masks,\n", + " use_one_hot_embeddings=False)\n", + " output_layer = model.get_sequence_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.variable_scope('cls/predictions',reuse=reuse):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " output_layer,\n", + " units = config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + "\n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " return logits\n", + "\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " \n", + " self.training_logits = forward(self.X, self.segment_ids, self.input_masks, decoder_input)\n", + " print(self.training_logits)\n", + "\n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " \n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " segment = tf.contrib.seq2seq.tile_batch(self.segment_ids, 1)\n", + " masks = tf.contrib.seq2seq.tile_batch(self.input_masks, 1)\n", + " logits = forward(x, segment, masks, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " bert_config.vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids\n", + " self.fast_result = tf.identity(self.fast_result, name = 'greedy')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "Tensor(\"cls/predictions/MatMul:0\", shape=(?, ?, 119547), dtype=float32)\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if 'bert/' + name in name_to_variable:\n", + " assignment_map[name] = name_to_variable['bert/' + name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " elif name in name_to_variable:\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " \n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "\n", + "checkpoint = BERT_INIT_CHKPNT\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from multi_cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 12500/12500 [1:11:58<00:00, 2.89it/s, accuracy=0.508, cost=2.58]\n", + "test minibatch loop: 100%|██████████| 313/313 [00:39<00:00, 7.83it/s, accuracy=0.604, cost=1.99]\n", + "train minibatch loop: 0%| | 0/12500 [00:00 3 and i not in [101, 102]])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3 and i not in [101, 102]])" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.37003958" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/49.conv-encoder-lstm.ipynb b/neural-machine-translation/49.conv-encoder-lstm.ipynb deleted file mode 100644 index db39cab..0000000 --- a/neural-machine-translation/49.conv-encoder-lstm.ipynb +++ /dev/null @@ -1,411 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "emb_size = 256\n", - "n_hidden = 256\n", - "n_layers = 4\n", - "learning_rate = 1e-3\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def encoder_block(inp, n_hidden, filter_size):\n", - " inp = tf.expand_dims(inp, 2)\n", - " inp = tf.pad(inp, [[0, 0], [(filter_size[0]-1)//2, (filter_size[0]-1)//2], [0, 0], [0, 0]])\n", - " conv = tf.layers.conv2d(inp, n_hidden, filter_size, padding=\"VALID\", activation=None)\n", - " conv = tf.squeeze(conv, 2)\n", - " return conv\n", - "\n", - "def glu(x):\n", - " return tf.multiply(x[:, :, :tf.shape(x)[2]//2], tf.sigmoid(x[:, :, tf.shape(x)[2]//2:]))\n", - "\n", - "def layer(inp, conv_block, kernel_width, n_hidden, residual=None):\n", - " z = conv_block(inp, n_hidden, (kernel_width, 1))\n", - " return glu(z) + (residual if residual is not None else 0)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self):\n", - "\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - "\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([len(dictionary_from), emb_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([len(dictionary_to), emb_size], -1, 1))\n", - " \n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, self.X)\n", - " \n", - " e = tf.identity(encoder_embedded)\n", - " for i in range(n_layers):\n", - " z = layer(encoder_embedded, encoder_block, 3, n_hidden * 2, encoder_embedded)\n", - " encoder_embedded = z\n", - " \n", - " encoder_output, output_memory = z, z + e\n", - " \n", - " vocab_proj = tf.layers.Dense(len(dictionary_to))\n", - " init_state = tf.reduce_mean(output_memory,axis=1)\n", - " cell = tf.nn.rnn_cell.LSTMCell(n_hidden)\n", - " helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = tf.to_int32(self.Y_seq_len))\n", - " encoder_state = tf.nn.rnn_cell.LSTMStateTuple(c=init_state, h=init_state)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(cell = cell,\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = vocab_proj)\n", - " decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder = decoder,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " \n", - " helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embedding = decoder_embedding,\n", - " start_tokens = tf.tile(\n", - " tf.constant([GO], \n", - " dtype=tf.int32), \n", - " [tf.shape(init_state)[0]]),\n", - " end_token = EOS)\n", - " decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = cell,\n", - " helper = helper,\n", - " initial_state = encoder_state,\n", - " output_layer = vocab_proj)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = decoder,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " self.logits = decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot()\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.570816, avg accuracy: 0.062293\n", - "epoch: 2, avg loss: 5.945370, avg accuracy: 0.096343\n", - "epoch: 3, avg loss: 5.764404, avg accuracy: 0.114156\n", - "epoch: 4, avg loss: 5.598949, avg accuracy: 0.125709\n", - "epoch: 5, avg loss: 5.446877, avg accuracy: 0.131909\n", - "epoch: 6, avg loss: 5.277187, avg accuracy: 0.144107\n", - "epoch: 7, avg loss: 5.091821, avg accuracy: 0.165582\n", - "epoch: 8, avg loss: 4.895868, avg accuracy: 0.177791\n", - "epoch: 9, avg loss: 4.701523, avg accuracy: 0.193142\n", - "epoch: 10, avg loss: 4.513534, avg accuracy: 0.207767\n", - "epoch: 11, avg loss: 4.335670, avg accuracy: 0.226102\n", - "epoch: 12, avg loss: 4.152852, avg accuracy: 0.249391\n", - "epoch: 13, avg loss: 3.971237, avg accuracy: 0.270366\n", - "epoch: 14, avg loss: 3.793662, avg accuracy: 0.290088\n", - "epoch: 15, avg loss: 3.622402, avg accuracy: 0.310878\n", - "epoch: 16, avg loss: 3.466393, avg accuracy: 0.338950\n", - "epoch: 17, avg loss: 3.295971, avg accuracy: 0.365091\n", - "epoch: 18, avg loss: 3.151556, avg accuracy: 0.386027\n", - "epoch: 19, avg loss: 3.006333, avg accuracy: 0.413153\n", - "epoch: 20, avg loss: 2.870467, avg accuracy: 0.438702\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi đã từng nghĩ , viên \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: chúng tôi làm việc này . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi đã làm việc này . \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi đã từng nghĩ , " \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/5.lstm-seq2seq-api-greedy.ipynb b/neural-machine-translation/5.lstm-seq2seq-api-greedy.ipynb deleted file mode 100644 index 38fa81e..0000000 --- a/neural-machine-translation/5.lstm-seq2seq-api-greedy.ipynb +++ /dev/null @@ -1,392 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " _, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " inputs = tf.nn.embedding_lookup(encoder_embedding, self.X),\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embedding,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.607375, avg accuracy: 0.049102\n", - "epoch: 2, avg loss: 6.104367, avg accuracy: 0.075862\n", - "epoch: 3, avg loss: 5.976006, avg accuracy: 0.091874\n", - "epoch: 4, avg loss: 5.886806, avg accuracy: 0.111627\n", - "epoch: 5, avg loss: 5.832548, avg accuracy: 0.111744\n", - "epoch: 6, avg loss: 5.772666, avg accuracy: 0.118138\n", - "epoch: 7, avg loss: 5.705306, avg accuracy: 0.121704\n", - "epoch: 8, avg loss: 5.622493, avg accuracy: 0.125727\n", - "epoch: 9, avg loss: 5.525803, avg accuracy: 0.134204\n", - "epoch: 10, avg loss: 5.425576, avg accuracy: 0.141575\n", - "epoch: 11, avg loss: 5.325510, avg accuracy: 0.143837\n", - "epoch: 12, avg loss: 5.228832, avg accuracy: 0.148775\n", - "epoch: 13, avg loss: 5.132649, avg accuracy: 0.154584\n", - "epoch: 14, avg loss: 5.047837, avg accuracy: 0.160110\n", - "epoch: 15, avg loss: 4.950598, avg accuracy: 0.165813\n", - "epoch: 16, avg loss: 4.857364, avg accuracy: 0.172679\n", - "epoch: 17, avg loss: 4.766294, avg accuracy: 0.178798\n", - "epoch: 18, avg loss: 4.682445, avg accuracy: 0.188729\n", - "epoch: 19, avg loss: 4.601195, avg accuracy: 0.195077\n", - "epoch: 20, avg loss: 4.520952, avg accuracy: 0.202590\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và một người , tôi là một người , và bạn có thể là một người , và bạn có thể ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: nhưng tôi có thể làm , tôi sẽ làm thế nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi sẽ làm tôi , bạn sẽ làm thế nào ? \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi có thể làm , tôi sẽ làm thế một người , và bạn có thể làm thế một người . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/5.lstm-seq2seq-contrib-greedy.ipynb b/neural-machine-translation/5.lstm-seq2seq-contrib-greedy.ipynb new file mode 100644 index 0000000..da15c93 --- /dev/null +++ b/neural-machine-translation/5.lstm-seq2seq-contrib-greedy.ipynb @@ -0,0 +1,801 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " _, encoder_state = tf.nn.dynamic_rnn(\n", + " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", + " inputs = tf.nn.embedding_lookup(embeddings, self.X),\n", + " sequence_length = self.X_seq_len,\n", + " dtype = tf.float32)\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " dense = tf.layers.Dense(vocab_size)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :23: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :26: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[30862, 31330, 7563, 6904, 6904, 14420, 14420, 5320, 3882,\n", + " 3882, 25569, 4014, 4014, 5510, 2817, 2817, 2817, 8488,\n", + " 11273, 5878, 5637, 1377, 8304, 8304, 5204, 5204, 15208,\n", + " 12328, 12328, 31810, 31810, 31810, 20439, 19659, 19659, 4132,\n", + " 4132, 25043, 1482, 1482, 20925, 20925, 20925, 15262, 15262,\n", + " 15262, 15262, 15262, 985, 985, 985, 985, 985, 17364,\n", + " 17364, 21699, 21699, 21699, 17364, 15097, 15097, 22568, 31795,\n", + " 31795, 31510, 31510, 10779, 6458, 6458, 145, 145, 5615],\n", + " [13133, 23439, 23439, 18141, 4568, 4568, 13649, 31341, 808,\n", + " 808, 11462, 11462, 1480, 1480, 1480, 27723, 27723, 27723,\n", + " 13440, 16353, 16353, 16353, 20605, 4837, 6188, 6188, 6188,\n", + " 3888, 24510, 24510, 714, 714, 809, 809, 4190, 19513,\n", + " 19513, 19513, 7917, 7917, 7917, 8973, 3280, 3280, 24510,\n", + " 24510, 24510, 25452, 7617, 7617, 7617, 26629, 993, 29224,\n", + " 29224, 17531, 17531, 17531, 17531, 16739, 16739, 16739, 25561,\n", + " 16774, 24148, 24148, 24148, 7669, 7669, 7669, 7669, 7669],\n", + " [10387, 15069, 14202, 26205, 26205, 27461, 17987, 15172, 17987,\n", + " 10596, 4460, 1914, 12050, 12050, 30584, 8982, 9221, 9221,\n", + " 9221, 9221, 26518, 26518, 26518, 19941, 22487, 22487, 22487,\n", + " 2744, 2744, 2744, 2744, 2744, 11374, 19791, 19791, 19791,\n", + " 8720, 8720, 26791, 1952, 25679, 14608, 14608, 12115, 17667,\n", + " 10218, 10218, 18171, 18171, 20997, 29896, 30158, 30158, 30158,\n", + " 30158, 10967, 10967, 349, 349, 349, 349, 20231, 20231,\n", + " 20231, 22875, 4251, 26256, 26256, 4251, 26256, 25270, 30956],\n", + " [17129, 25926, 10990, 21443, 12768, 12768, 12768, 17867, 2925,\n", + " 2925, 2925, 30130, 30130, 16013, 5830, 5830, 4894, 4894,\n", + " 4894, 20143, 20143, 20143, 26802, 8537, 8537, 8537, 20865,\n", + " 20865, 2439, 28033, 28033, 25857, 25857, 25857, 25857, 30418,\n", + " 11507, 11507, 24171, 1245, 14881, 14881, 14881, 1646, 1646,\n", + " 15021, 18507, 18507, 22884, 22884, 22884, 21795, 20631, 21795,\n", + " 14721, 2768, 2768, 2768, 2768, 10262, 10262, 10262, 10262,\n", + " 7427, 7427, 7427, 29551, 4018, 4018, 25970, 25970, 15114],\n", + " [ 6164, 18130, 12590, 12590, 22029, 22029, 14948, 7164, 10457,\n", + " 7164, 8252, 14881, 14881, 15305, 15305, 1646, 1646, 9217,\n", + " 9217, 30041, 30041, 30041, 23367, 23367, 23367, 12176, 12176,\n", + " 12176, 12176, 12176, 12176, 27389, 27389, 27389, 13683, 13683,\n", + " 25753, 25753, 25753, 25753, 3222, 3222, 3222, 31032, 31032,\n", + " 31032, 9520, 5513, 5513, 5513, 30051, 30051, 30051, 29008,\n", + " 29008, 30051, 19999, 12553, 15369, 12553, 27068, 27068, 15771,\n", + " 15771, 15771, 15771, 15771, 24180, 13710, 13710, 13710, 13710],\n", + " [ 6720, 14590, 14590, 23462, 23462, 30270, 30270, 30270, 30270,\n", + " 9655, 9655, 9655, 3925, 3925, 23030, 10941, 10941, 10115,\n", + " 10115, 22491, 94, 9734, 8718, 8718, 13733, 27096, 27096,\n", + " 27096, 12767, 12767, 2167, 23666, 23666, 14726, 14726, 4900,\n", + " 958, 958, 958, 11290, 958, 5010, 5010, 19747, 5267,\n", + " 5267, 5267, 31828, 1623, 1623, 1623, 29611, 29611, 23917,\n", + " 23917, 23360, 23360, 18726, 18726, 19774, 19774, 5943, 5943,\n", + " 221, 221, 221, 8403, 8403, 8403, 3926, 10207, 17169],\n", + " [18851, 11340, 25700, 25700, 29668, 30601, 4782, 24464, 10815,\n", + " 10815, 10815, 10815, 876, 16780, 16780, 16780, 16780, 8640,\n", + " 8640, 25856, 27461, 27461, 2104, 2104, 2104, 26027, 18308,\n", + " 18308, 676, 676, 676, 1298, 1298, 1298, 1298, 28409,\n", + " 28409, 28409, 24181, 24181, 14966, 14966, 14966, 14966, 27829,\n", + " 27829, 1617, 3132, 3132, 25578, 25578, 11490, 28680, 28680,\n", + " 30474, 9796, 9796, 1281, 16502, 19883, 19883, 19571, 19571,\n", + " 19571, 19571, 22521, 22521, 22521, 23214, 23214, 23853, 20656],\n", + " [ 3058, 14447, 26205, 25019, 4542, 4542, 31133, 31133, 20898,\n", + " 20898, 20898, 7916, 10869, 10869, 10869, 31337, 10869, 7399,\n", + " 7399, 1692, 20518, 6416, 6416, 23824, 23824, 9857, 9857,\n", + " 9857, 9857, 21825, 6794, 6794, 10779, 10779, 10121, 10121,\n", + " 10121, 10121, 30109, 30109, 30109, 11582, 29442, 29442, 3128,\n", + " 3128, 7128, 7128, 25740, 25740, 25740, 5143, 7755, 7755,\n", + " 7755, 630, 31952, 7755, 9976, 3878, 3878, 3878, 3878,\n", + " 3878, 3878, 23798, 23798, 15016, 23798, 1306, 1306, 4477],\n", + " [13597, 13597, 13597, 13597, 20651, 20651, 20651, 20651, 24766,\n", + " 31083, 7296, 7296, 7296, 7296, 7296, 7296, 17042, 17042,\n", + " 5782, 20873, 20873, 20873, 20873, 20873, 11467, 11467, 27926,\n", + " 27926, 27926, 27926, 12095, 12095, 8448, 30264, 30264, 10778,\n", + " 13596, 10778, 24407, 24407, 23176, 18902, 18902, 18902, 30810,\n", + " 30810, 24019, 24019, 3817, 24724, 24724, 24437, 21076, 21076,\n", + " 28865, 28865, 7518, 30761, 30761, 4190, 14074, 651, 651,\n", + " 7242, 809, 9368, 17583, 7320, 11770, 11770, 13779, 12818],\n", + " [31185, 31185, 3062, 17654, 17654, 22079, 24136, 1765, 21788,\n", + " 21788, 13298, 30743, 30743, 30743, 12946, 30190, 30190, 4244,\n", + " 4244, 4244, 22705, 22705, 31853, 31853, 31853, 22705, 5248,\n", + " 5248, 23739, 19055, 19055, 19055, 5248, 20593, 20593, 20593,\n", + " 14267, 14267, 29946, 23329, 23329, 4287, 2133, 2133, 9073,\n", + " 15932, 29761, 29761, 29761, 29761, 16181, 16181, 25048, 25048,\n", + " 17627, 17627, 9932, 7953, 1529, 1529, 1529, 22211, 20732,\n", + " 27421, 14005, 14005, 16275, 16275, 12778, 12778, 19235, 7967]],\n", + " dtype=int32), 10.373379, 0.005076142]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [15:45<00:00, 1.65it/s, accuracy=0.208, cost=5.16]\n", + "minibatch loop: 100%|██████████| 40/40 [00:13<00:00, 2.88it/s, accuracy=0.21, cost=4.57] \n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/50.bertmultilanguage-encoder-lstm-decoder.ipynb b/neural-machine-translation/50.bertmultilanguage-encoder-lstm-decoder.ipynb new file mode 100644 index 0000000..fb544c7 --- /dev/null +++ b/neural-machine-translation/50.bertmultilanguage-encoder-lstm-decoder.ipynb @@ -0,0 +1,946 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip\n", + "# !unzip multi_cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "BERT_VOCAB = 'multi_cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'multi_cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'multi_cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "GO = 101\n", + "EOS = 102" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from unidecode import unidecode\n", + "\n", + "def get_inputs(x, y):\n", + " input_ids, input_masks, segment_ids, ys = [], [], [], []\n", + " for i in tqdm(range(len(x))):\n", + " tokens_a = tokenizer.tokenize(unidecode(x[i]))\n", + " tokens_b = tokenizer.tokenize(unidecode(y[i]))\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " \n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " \n", + " r = tokenizer.convert_tokens_to_ids(tokens_b + [\"[SEP]\"])\n", + " if len([k for k in r if k == 0]):\n", + " print(y[i], i)\n", + " break\n", + " \n", + " ys.append(r)\n", + " \n", + " return input_ids, input_masks, segment_ids, ys" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 200000/200000 [02:39<00:00, 1255.89it/s]\n" + ] + } + ], + "source": [ + "train_input_ids, train_input_masks, train_segment_ids, train_Y = get_inputs(train_X, train_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 5000/5000 [00:04<00:00, 1089.86it/s]\n" + ] + } + ], + "source": [ + "test_input_ids, test_input_masks, test_segment_ids, test_Y = get_inputs(test_X, test_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "epoch = 20\n", + "batch_size = 16\n", + "warmup_proportion = 0.1\n", + "num_train_steps = len(train_input_ids)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " size_layer,\n", + " num_layers,\n", + " learning_rate = 2e-5,\n", + " training = True,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=training,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " pooled_output = model.get_pooled_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " dense = tf.layers.Dense(bert_config.vocab_size)\n", + " \n", + " def cells(size_layer=size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=pooled_output, h=pooled_output)\n", + " \n", + " encoder_state = tuple([lstm_state] * num_layers)\n", + " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " \n", + " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(embedding, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " self.training_logits = training_decoder_output.rnn_output\n", + " \n", + " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", + " embedding = embedding,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS)\n", + " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = decoder_cells,\n", + " helper = predicting_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = dense)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", + " self.fast_result = predicting_decoder_output.sample_id\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From :35: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :40: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/contrib/seq2seq/python/ops/decoder.py:420: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(bert_config.hidden_size, 2)\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if 'bert/' + name in name_to_variable:\n", + " assignment_map[name] = name_to_variable['bert/' + name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " elif name in name_to_variable:\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " \n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "\n", + "checkpoint = BERT_INIT_CHKPNT\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from multi_cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 12500/12500 [1:38:03<00:00, 2.12it/s, accuracy=0.163, cost=5.76] \n", + "test minibatch loop: 100%|██████████| 313/313 [00:45<00:00, 6.95it/s, accuracy=0.163, cost=5.62]\n", + "train minibatch loop: 0%| | 0/12500 [00:00 3 and i not in [101, 102]])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3 and i not in [101, 102]])" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.11384286" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/50.byte-net-greedy.ipynb b/neural-machine-translation/50.byte-net-greedy.ipynb deleted file mode 100644 index 99e8e39..0000000 --- a/neural-machine-translation/50.byte-net-greedy.ipynb +++ /dev/null @@ -1,570 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, sentence_batch_y, pad_int):\n", - " x, y = [], []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " max_sentence_len_y = max([len(sentence) for sentence in sentence_batch_y])\n", - " max_sentence_len = max(max_sentence_len, max_sentence_len_y)\n", - " for no, sentence in enumerate(sentence_batch):\n", - " x.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " y.append(sentence_batch_y[no] + [pad_int] * (max_sentence_len - len(sentence_batch_y[no])))\n", - " return x, y" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_normalization(inputs, block_name, reuse, epsilon=1e-8):\n", - " with tf.variable_scope(block_name, reuse = reuse):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - "\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - "\n", - " outputs = gamma * normalized + beta\n", - " return outputs\n", - "\n", - "def conv1d(input_, output_channels, block_name, reuse, dilation = 1, filter_width = 1, causal = False):\n", - " with tf.variable_scope(block_name, reuse = reuse):\n", - " w = tf.get_variable('w', [1, filter_width, int(input_.get_shape()[-1]), output_channels],\n", - " tf.float32, tf.initializers.random_normal(stddev = 0.02))\n", - " b = tf.get_variable('b', [output_channels],\n", - " tf.float32, tf.zeros_initializer())\n", - " if causal:\n", - " padding = [[0, 0], [(filter_width - 1) * dilation, 0], [0, 0]]\n", - " padded = tf.pad(input_, padding)\n", - " input_expanded = tf.expand_dims(padded, dim = 1)\n", - " out = tf.nn.atrous_conv2d(input_expanded, w, rate = dilation, padding = 'VALID') + b\n", - " else:\n", - " input_expanded = tf.expand_dims(input_, dim = 1)\n", - " out = tf.nn.atrous_conv2d(input_expanded, w, rate = dilation, padding = 'SAME') + b\n", - " return tf.squeeze(out, [1])\n", - "\n", - "def bytenet_residual_block(input_, dilation, layer_no, \n", - " residual_channels, filter_width, block_type,\n", - " causal = True, reuse = False):\n", - " block_name = \"bytenet_{}_layer_{}_{}\".format(block_type, layer_no, dilation)\n", - " print(block_name)\n", - " with tf.variable_scope(block_name, reuse = reuse):\n", - " relu1 = tf.nn.relu(layer_normalization(input_, block_name + '_0', reuse))\n", - " conv1 = conv1d(relu1, residual_channels, block_name + '_0', reuse)\n", - " relu2 = tf.nn.relu(layer_normalization(conv1, block_name + '_1', reuse))\n", - " dilated_conv = conv1d(relu2, residual_channels,\n", - " block_name + '_1', reuse,\n", - " dilation, filter_width,\n", - " causal = causal)\n", - " print(dilated_conv)\n", - " relu3 = tf.nn.relu(layer_normalization(dilated_conv, block_name + '_2', reuse))\n", - " conv2 = conv1d(relu3, 2 * residual_channels, block_name + '_2', reuse)\n", - " return input_ + conv2\n", - " \n", - "class ByteNet:\n", - " def __init__(self, from_vocab_size, to_vocab_size, channels, encoder_dilations,\n", - " decoder_dilations, encoder_filter_width, decoder_filter_width,\n", - " learning_rate = 0.001, beta1=0.5):\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " target_1 = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " embedding_channels = 2 * channels\n", - " max_seq = tf.maximum(tf.reduce_max(self.Y_seq_len), tf.reduce_max(self.X_seq_len))\n", - " w_source_embedding = tf.Variable(tf.random_normal([from_vocab_size, \n", - " embedding_channels], stddev = 0.02))\n", - " w_target_embedding = tf.Variable(tf.random_normal([to_vocab_size, \n", - " embedding_channels], stddev = 0.02))\n", - " \n", - " def forward(x, y, reuse = False):\n", - " source_embedding = tf.nn.embedding_lookup(w_source_embedding, x)\n", - " target_1_embedding = tf.nn.embedding_lookup(w_target_embedding, y)\n", - " \n", - " \n", - " curr_input = source_embedding\n", - " for layer_no, dilation in enumerate(encoder_dilations):\n", - " curr_input = bytenet_residual_block(curr_input, dilation, \n", - " layer_no, channels, \n", - " encoder_filter_width,\n", - " 'encoder',\n", - " causal = False, reuse = reuse)\n", - " encoder_output = curr_input\n", - " combined_embedding = target_1_embedding + encoder_output\n", - " curr_input = combined_embedding\n", - " for layer_no, dilation in enumerate(decoder_dilations):\n", - " curr_input = bytenet_residual_block(curr_input, dilation, \n", - " layer_no, channels, \n", - " encoder_filter_width, \n", - " 'decoder',\n", - " causal = False, reuse = reuse)\n", - " with tf.variable_scope('logits', reuse = reuse):\n", - " return conv1d(curr_input, to_vocab_size, 'logits', reuse)\n", - " \n", - " self.logits = forward(self.X, target_1)\n", - " masks = tf.sequence_mask(self.Y_seq_len, max_seq, dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", - " \n", - " def cond(i, y, temp):\n", - " return i < tf.reduce_max(max_seq)\n", - " \n", - " def body(i, y, temp):\n", - " logits = forward(self.X, y, reuse = True)\n", - " ids = tf.argmax(logits, -1)[:, i]\n", - " ids = tf.expand_dims(ids, -1)\n", - " temp = tf.concat([temp[:, 1:], ids], -1)\n", - " y = tf.concat([temp[:, -(i+1):], temp[:, :-(i+1)]], -1)\n", - " y = tf.reshape(y, [tf.shape(temp)[0], max_seq])\n", - " i += 1\n", - " return i, y, temp\n", - " \n", - " target = tf.fill([batch_size, max_seq], GO)\n", - " target = tf.cast(target, tf.int64)\n", - " self.target = target\n", - " \n", - " _, self.predicting_ids, _ = tf.while_loop(cond, body, \n", - " [tf.constant(0), target, target])" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "residual_channels = 128\n", - "encoder_dilations = [1,2,4,8,16,1,2,4,8,16]\n", - "decoder_dilations = [1,2,4,8,16,1,2,4,8,16]\n", - "encoder_filter_width = 3\n", - "decoder_filter_width = 3\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "bytenet_encoder_layer_0_1\n", - "WARNING:tensorflow:From :25: calling expand_dims (from tensorflow.python.ops.array_ops) with dim is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use the `axis` argument instead\n", - "Tensor(\"bytenet_encoder_layer_0_1/bytenet_encoder_layer_0_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_1_2\n", - "Tensor(\"bytenet_encoder_layer_1_2/bytenet_encoder_layer_1_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_2_4\n", - "Tensor(\"bytenet_encoder_layer_2_4/bytenet_encoder_layer_2_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_3_8\n", - "Tensor(\"bytenet_encoder_layer_3_8/bytenet_encoder_layer_3_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_4_16\n", - "Tensor(\"bytenet_encoder_layer_4_16/bytenet_encoder_layer_4_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_5_1\n", - "Tensor(\"bytenet_encoder_layer_5_1/bytenet_encoder_layer_5_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_6_2\n", - "Tensor(\"bytenet_encoder_layer_6_2/bytenet_encoder_layer_6_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_7_4\n", - "Tensor(\"bytenet_encoder_layer_7_4/bytenet_encoder_layer_7_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_8_8\n", - "Tensor(\"bytenet_encoder_layer_8_8/bytenet_encoder_layer_8_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_9_16\n", - "Tensor(\"bytenet_encoder_layer_9_16/bytenet_encoder_layer_9_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_0_1\n", - "Tensor(\"bytenet_decoder_layer_0_1/bytenet_decoder_layer_0_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_1_2\n", - "Tensor(\"bytenet_decoder_layer_1_2/bytenet_decoder_layer_1_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_2_4\n", - "Tensor(\"bytenet_decoder_layer_2_4/bytenet_decoder_layer_2_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_3_8\n", - "Tensor(\"bytenet_decoder_layer_3_8/bytenet_decoder_layer_3_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_4_16\n", - "Tensor(\"bytenet_decoder_layer_4_16/bytenet_decoder_layer_4_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_5_1\n", - "Tensor(\"bytenet_decoder_layer_5_1/bytenet_decoder_layer_5_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_6_2\n", - "Tensor(\"bytenet_decoder_layer_6_2/bytenet_decoder_layer_6_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_7_4\n", - "Tensor(\"bytenet_decoder_layer_7_4/bytenet_decoder_layer_7_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_8_8\n", - "Tensor(\"bytenet_decoder_layer_8_8/bytenet_decoder_layer_8_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_9_16\n", - "Tensor(\"bytenet_decoder_layer_9_16/bytenet_decoder_layer_9_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_0_1\n", - "Tensor(\"while/bytenet_encoder_layer_0_1/bytenet_encoder_layer_0_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_1_2\n", - "Tensor(\"while/bytenet_encoder_layer_1_2/bytenet_encoder_layer_1_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_2_4\n", - "Tensor(\"while/bytenet_encoder_layer_2_4/bytenet_encoder_layer_2_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_3_8\n", - "Tensor(\"while/bytenet_encoder_layer_3_8/bytenet_encoder_layer_3_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_4_16\n", - "Tensor(\"while/bytenet_encoder_layer_4_16/bytenet_encoder_layer_4_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_5_1\n", - "Tensor(\"while/bytenet_encoder_layer_5_1/bytenet_encoder_layer_5_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_6_2\n", - "Tensor(\"while/bytenet_encoder_layer_6_2/bytenet_encoder_layer_6_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_7_4\n", - "Tensor(\"while/bytenet_encoder_layer_7_4/bytenet_encoder_layer_7_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_8_8\n", - "Tensor(\"while/bytenet_encoder_layer_8_8/bytenet_encoder_layer_8_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_encoder_layer_9_16\n", - "Tensor(\"while/bytenet_encoder_layer_9_16/bytenet_encoder_layer_9_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_0_1\n", - "Tensor(\"while/bytenet_decoder_layer_0_1/bytenet_decoder_layer_0_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_1_2\n", - "Tensor(\"while/bytenet_decoder_layer_1_2/bytenet_decoder_layer_1_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_2_4\n", - "Tensor(\"while/bytenet_decoder_layer_2_4/bytenet_decoder_layer_2_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_3_8\n", - "Tensor(\"while/bytenet_decoder_layer_3_8/bytenet_decoder_layer_3_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_4_16\n", - "Tensor(\"while/bytenet_decoder_layer_4_16/bytenet_decoder_layer_4_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_5_1\n", - "Tensor(\"while/bytenet_decoder_layer_5_1/bytenet_decoder_layer_5_1_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_6_2\n", - "Tensor(\"while/bytenet_decoder_layer_6_2/bytenet_decoder_layer_6_2_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_7_4\n", - "Tensor(\"while/bytenet_decoder_layer_7_4/bytenet_decoder_layer_7_4_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_8_8\n", - "Tensor(\"while/bytenet_decoder_layer_8_8/bytenet_decoder_layer_8_8_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n", - "bytenet_decoder_layer_9_16\n", - "Tensor(\"while/bytenet_decoder_layer_9_16/bytenet_decoder_layer_9_16_1_1/Squeeze:0\", shape=(?, ?, 128), dtype=float32)\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = ByteNet(len(dictionary_from), len(dictionary_to), \n", - " residual_channels, encoder_dilations, decoder_dilations,\n", - " encoder_filter_width,decoder_filter_width)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.328035, avg accuracy: 0.072276\n", - "epoch: 2, avg loss: 5.965444, avg accuracy: 0.094979\n", - "epoch: 3, avg loss: 5.767229, avg accuracy: 0.105403\n", - "epoch: 4, avg loss: 5.496558, avg accuracy: 0.143506\n", - "epoch: 5, avg loss: 5.061917, avg accuracy: 0.187899\n", - "epoch: 6, avg loss: 4.610906, avg accuracy: 0.227049\n", - "epoch: 7, avg loss: 4.046287, avg accuracy: 0.279907\n", - "epoch: 8, avg loss: 3.496574, avg accuracy: 0.345208\n", - "epoch: 9, avg loss: 2.976003, avg accuracy: 0.413868\n", - "epoch: 10, avg loss: 2.452929, avg accuracy: 0.501045\n", - "epoch: 11, avg loss: 1.891984, avg accuracy: 0.604314\n", - "epoch: 12, avg loss: 1.380336, avg accuracy: 0.710687\n", - "epoch: 13, avg loss: 1.001658, avg accuracy: 0.793238\n", - "epoch: 14, avg loss: 0.774593, avg accuracy: 0.842097\n", - "epoch: 15, avg loss: 0.455521, avg accuracy: 0.927808\n", - "epoch: 16, avg loss: 0.280280, avg accuracy: 0.970531\n", - "epoch: 17, avg loss: 0.159310, avg accuracy: 1.002358\n", - "epoch: 18, avg loss: 0.139503, avg accuracy: 1.003681\n", - "epoch: 19, avg loss: 0.063580, avg accuracy: 1.019896\n", - "epoch: 20, avg loss: 0.025332, avg accuracy: 1.023528\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, batch_y = pad_sentence_batch(X[k: index], Y[k: index], PAD)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(4, 42)" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "predicted = sess.run(model.predicting_ids, \n", - " feed_dict={model.X:batch_x,model.Y:batch_y})\n", - "predicted.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: no , no , no . it 's four minutes long .\n", - "REAL ANSWER: không không không . nó chỉ dài bốn phút thôi .\n", - "PREDICTED ANSWER: không chỉ chỉ không được chỉ dài bốn phút mang đó pháp sử đi nếu ấy ngày hợp thao qua qua xấu hoặc trung tượng đi qua kiếm xấu qua rằng vấn tàu vấn rằng rằng thực thực đi chút tàu \n", - "\n", - "row 2\n", - "QUESTION: and you know , it was not a failure for ourselves in itself , but it was a failure that will impact his full life .\n", - "REAL ANSWER: và bạn biết đó , nó không phải là thất bại với chính bản thân chúng tôi mà là thất bại sẽ ảnh hưởng đến suốt đời mario .\n", - "PREDICTED ANSWER: khi nếu muốn đó thuật việc đề được tìm thất bại khí ý đồng đen đó toàn mà cũng thất tác chọn ảnh hưởng đến suốt đời " qua nói " ấy tối phê trung xa qua tài xấu giá lý \n", - "\n", - "row 3\n", - "QUESTION: but the choice was theirs , and our audience quickly grew to choose the richest and most varied diet that we could provide .\n", - "REAL ANSWER: nhưng sự lựa chọn thuộc về chúng , và khán thính giả của chúng tôi tăng lên nhanh chóng để chọn những món " ăn kiêng " giàu nhất và đa dạng nhất mà chúng tôi có thể cung cấp .\n", - "PREDICTED ANSWER: nhưng chiếc lựa chọn thuộc về chúng dành nên khán thính sát nhật nói 1 tăng lên tuần chóng nên chọn lên món " mạo truyền hoá xã đến qua đem luật yêu rwanda xa rằng giành xấu giá giá " \n", - "\n", - "row 4\n", - "QUESTION: a freshly waxed car , the water molecules slump to about 90 degrees .\n", - "REAL ANSWER: một xe vừa bôi sáp , những phân tử nước sụt xuống gần ̣ 90 độ .\n", - "PREDICTED ANSWER: từ xe vừa bôi sáp xét từ khô tử nước sụt xuống gần ̣ trị độ hoặc nguồn đến nguồn đến dụ qua đi qua yêu trung trở trung xuôi tàu qua xấu vấn tàu xấu qua thực hiểu dụng " \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/51.bertmultilanguage-encoder-transformer-decoder.ipynb b/neural-machine-translation/51.bertmultilanguage-encoder-transformer-decoder.ipynb new file mode 100644 index 0000000..00c0f3b --- /dev/null +++ b/neural-machine-translation/51.bertmultilanguage-encoder-transformer-decoder.ipynb @@ -0,0 +1,1120 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip\n", + "# !unzip multi_cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "BERT_VOCAB = 'multi_cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'multi_cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'multi_cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from unidecode import unidecode\n", + "\n", + "def get_inputs(x, y):\n", + " input_ids, input_masks, segment_ids, ys = [], [], [], []\n", + " for i in tqdm(range(len(x))):\n", + " tokens_a = tokenizer.tokenize(unidecode(x[i]))\n", + " tokens_b = tokenizer.tokenize(unidecode(y[i]))\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " \n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " \n", + " r = tokenizer.convert_tokens_to_ids(tokens_b) + [EOS]\n", + " if len([k for k in r if k == 0]):\n", + " print(y[i], i)\n", + " break\n", + " \n", + " ys.append(r)\n", + " \n", + " return input_ids, input_masks, segment_ids, ys" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 200000/200000 [02:46<00:00, 1202.17it/s]\n" + ] + } + ], + "source": [ + "train_input_ids, train_input_masks, train_segment_ids, train_Y = get_inputs(train_X, train_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 5000/5000 [00:04<00:00, 1249.30it/s]\n" + ] + } + ], + "source": [ + "test_input_ids, test_input_masks, test_segment_ids, test_Y = get_inputs(test_X, test_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "epoch = 20\n", + "batch_size = 16\n", + "warmup_proportion = 0.1\n", + "num_train_steps = len(train_input_ids)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "119547" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bert_config.vocab_size" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "\n", + "BASE_PARAMS = defaultdict(\n", + " lambda: None, # Set default value to None.\n", + "\n", + " # Input params\n", + " default_batch_size=2048, # Maximum number of tokens per batch of examples.\n", + " default_batch_size_tpu=32768,\n", + " max_length=256, # Maximum number of tokens per example.\n", + "\n", + " # Model params\n", + " initializer_gain=1.0, # Used in trainable variable initialization.\n", + " vocab_size=32000, # Number of tokens defined in the vocabulary file.\n", + " hidden_size=768, # Model dimension in the hidden layers.\n", + " num_hidden_layers=6, # Number of layers in the encoder and decoder stacks.\n", + " num_heads=8, # Number of heads to use in multi-headed attention.\n", + " filter_size=2048, # Inner layer dimension in the feedforward network.\n", + "\n", + " # Dropout values (only used when training)\n", + " layer_postprocess_dropout=0.1,\n", + " attention_dropout=0.1,\n", + " relu_dropout=0.1,\n", + "\n", + " # Training params\n", + " label_smoothing=0.1,\n", + " learning_rate=1.0,\n", + " learning_rate_decay_rate=1.0,\n", + " learning_rate_warmup_steps=16000,\n", + "\n", + " # Optimizer params\n", + " optimizer_adam_beta1=0.9,\n", + " optimizer_adam_beta2=0.997,\n", + " optimizer_adam_epsilon=1e-09,\n", + "\n", + " # Default prediction params\n", + " extra_decode_length=50,\n", + " beam_size=4,\n", + " alpha=0.6, # used to calculate length normalization in beam search\n", + "\n", + " # TPU specific parameters\n", + " use_tpu=False,\n", + " static_batch=False,\n", + " allow_ffn_pad=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:24: The name tf.layers.Layer is deprecated. Please use tf.compat.v1.layers.Layer instead.\n", + "\n" + ] + } + ], + "source": [ + "from transformer import model_utils\n", + "from transformer import utils\n", + "from transformer.transformer import DecoderStack\n", + "from transformer import beam_search\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " learning_rate = 2e-5,\n", + " training = True,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=training,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " self.decoder_stack = DecoderStack(BASE_PARAMS, training)\n", + " attention_bias = model_utils.get_padding_bias(self.X)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " pooled_output = model.get_pooled_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.name_scope(\"decode\"):\n", + " mask = tf.to_float(tf.not_equal(self.Y, 0))\n", + " decoder_inputs = tf.gather(embedding, self.Y)\n", + " decoder_inputs *= tf.expand_dims(mask, -1)\n", + " with tf.name_scope(\"shift_targets\"):\n", + " decoder_inputs = tf.pad(\n", + " decoder_inputs, [[0, 0], [1, 0], [0, 0]])[:, :-1, :]\n", + " with tf.name_scope(\"add_pos_encoding\"):\n", + " length = tf.shape(decoder_inputs)[1]\n", + " decoder_inputs += model_utils.get_position_encoding(\n", + " length, BASE_PARAMS[\"hidden_size\"])\n", + " if training:\n", + " decoder_inputs = tf.nn.dropout(\n", + " decoder_inputs, 1 - BASE_PARAMS[\"layer_postprocess_dropout\"])\n", + " decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias(length)\n", + " outputs = self.decoder_stack(\n", + " decoder_inputs, output_layer, decoder_self_attention_bias,\n", + " attention_bias)\n", + " \n", + " with tf.variable_scope('cls/predictions'):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " outputs,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + "\n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " self.training_logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " print(self.training_logits)\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " def _get_symbols_to_logits_fn(max_decode_length):\n", + " timing_signal = model_utils.get_position_encoding(\n", + " max_decode_length + 1, BASE_PARAMS[\"hidden_size\"])\n", + " decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias(\n", + " max_decode_length)\n", + " def symbols_to_logits_fn(ids, i, cache):\n", + " decoder_input = ids[:, -1:]\n", + " mask = tf.to_float(tf.not_equal(decoder_input, 0))\n", + " decoder_input = tf.gather(embedding, decoder_input)\n", + " decoder_input *= tf.expand_dims(mask, -1)\n", + " decoder_input += timing_signal[i:i + 1]\n", + " self_attention_bias = decoder_self_attention_bias[:, :, i:i + 1, :i + 1]\n", + " decoder_outputs = self.decoder_stack(\n", + " decoder_input, cache.get(\"encoder_outputs\"), self_attention_bias,\n", + " cache.get(\"encoder_decoder_attention_bias\"), cache)\n", + " \n", + " with tf.variable_scope('cls/predictions', reuse = True):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " decoder_outputs,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + "\n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " logits = tf.squeeze(logits, axis=[1])\n", + " return logits, cache\n", + " return symbols_to_logits_fn\n", + " \n", + " batch_size = tf.shape(output_layer)[0]\n", + " input_length = tf.shape(output_layer)[1]\n", + " max_decode_length = input_length + BASE_PARAMS[\"extra_decode_length\"]\n", + " symbols_to_logits_fn = _get_symbols_to_logits_fn(max_decode_length)\n", + " initial_ids = tf.zeros([batch_size], dtype=tf.int32)\n", + " cache = {\n", + " \"layer_%d\" % layer: {\n", + " \"k\": tf.zeros([batch_size, 0, BASE_PARAMS[\"hidden_size\"]]),\n", + " \"v\": tf.zeros([batch_size, 0, BASE_PARAMS[\"hidden_size\"]]),\n", + " } for layer in range(BASE_PARAMS[\"num_hidden_layers\"])}\n", + " cache[\"encoder_outputs\"] = output_layer\n", + " cache[\"encoder_decoder_attention_bias\"] = attention_bias\n", + " \n", + " decoded_ids, scores = beam_search.sequence_beam_search(\n", + " symbols_to_logits_fn=symbols_to_logits_fn,\n", + " initial_ids=initial_ids,\n", + " initial_cache=cache,\n", + " vocab_size=bert_config.vocab_size,\n", + " beam_size=1,\n", + " alpha=BASE_PARAMS[\"alpha\"],\n", + " max_decode_length=max_decode_length,\n", + " eos_id=EOS)\n", + " \n", + " top_decoded_ids = decoded_ids[:, 0, 1:]\n", + " self.fast_result = top_decoded_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:39: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:89: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:71: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.\n", + "\n", + "Tensor(\"cls/predictions/MatMul:0\", shape=(?, ?, 119547), dtype=float32)\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/beam_search.py:420: calling reduce_logsumexp_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "keep_dims is deprecated, use keepdims instead\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if 'bert/' + name in name_to_variable:\n", + " assignment_map[name] = name_to_variable['bert/' + name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " elif name in name_to_variable:\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " \n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "\n", + "checkpoint = BERT_INIT_CHKPNT\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from multi_cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "i = 0\n", + "index = min(i + batch_size, len(train_input_ids))\n", + "batch_x = train_input_ids[i: index]\n", + "batch_x = pad_sequences(batch_x, padding='post')\n", + "batch_mask = train_input_masks[i: index]\n", + "batch_mask = pad_sequences(batch_mask, padding='post')\n", + "batch_segment = train_segment_ids[i: index]\n", + "batch_segment = pad_sequences(batch_segment, padding='post')\n", + "batch_y = pad_sequences(train_Y[i: index], padding='post')\n", + "acc, cost, r = sess.run(\n", + " [model.accuracy, model.cost, model.fast_result],\n", + " feed_dict = {\n", + " model.Y: batch_y,\n", + " model.X: batch_x,\n", + " model.input_masks: batch_mask,\n", + " model.segment_ids: batch_segment\n", + " },\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 12500/12500 [1:13:53<00:00, 2.82it/s, accuracy=0.244, cost=4.65]\n", + "test minibatch loop: 100%|██████████| 313/313 [00:43<00:00, 7.23it/s, accuracy=0.224, cost=4.46]\n", + "train minibatch loop: 0%| | 0/12500 [00:00 1 and i not in [101, 102]])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 1 and i not in [101, 102]])" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.3941662" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/51.gru-birnn-seq2seq-greedy-residual.ipynb b/neural-machine-translation/51.gru-birnn-seq2seq-greedy-residual.ipynb deleted file mode 100644 index ec0945e..0000000 --- a/neural-machine-translation/51.gru-birnn-seq2seq-greedy-residual.ipynb +++ /dev/null @@ -1,448 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['tôi tiếp tục làm thí nghiệm này 1 thời gian',\n", - " 'và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .']" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "text_to[-2:]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " batch_size, dropout = 0.5, beam_width = 15):\n", - " \n", - " def cell(size, residual, reuse=False):\n", - " c = tf.nn.rnn_cell.GRUCell(size, reuse=reuse)\n", - " if residual:\n", - " c = tf.nn.rnn_cell.ResidualWrapper(c)\n", - " return c\n", - " \n", - " def cells(size, residual = 2):\n", - " cell_list = []\n", - " for i in range(num_layers):\n", - " cell_list.append(cell(size, (i >= num_layers - residual)))\n", - " return cell_list\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " # encoder\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = tf.nn.rnn_cell.MultiRNNCell(cells(size_layer // 2)),\n", - " cell_bw = tf.nn.rnn_cell.MultiRNNCell(cells(size_layer // 2)),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " bi_state = tf.concat((state_fw[-1], state_bw[-1]), -1)\n", - " self.encoder_state = tuple([bi_state] * num_layers)\n", - " \n", - " self.encoder_state = tuple(self.encoder_state[-1] for _ in range(num_layers))\n", - " print(self.encoder_state)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " # decoder\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell(cells(size_layer))\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = self.encoder_state,\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "(, , , )\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.534297, avg accuracy: 0.061955\n", - "epoch: 2, avg loss: 6.023520, avg accuracy: 0.079758\n", - "epoch: 3, avg loss: 5.883783, avg accuracy: 0.083657\n", - "epoch: 4, avg loss: 5.692253, avg accuracy: 0.103227\n", - "epoch: 5, avg loss: 5.532007, avg accuracy: 0.112095\n", - "epoch: 6, avg loss: 5.364343, avg accuracy: 0.116559\n", - "epoch: 7, avg loss: 5.218357, avg accuracy: 0.124333\n", - "epoch: 8, avg loss: 5.048472, avg accuracy: 0.129905\n", - "epoch: 9, avg loss: 4.805836, avg accuracy: 0.142372\n", - "epoch: 10, avg loss: 4.560356, avg accuracy: 0.160021\n", - "epoch: 11, avg loss: 4.312730, avg accuracy: 0.180998\n", - "epoch: 12, avg loss: 4.061545, avg accuracy: 0.211403\n", - "epoch: 13, avg loss: 3.797839, avg accuracy: 0.240503\n", - "epoch: 14, avg loss: 3.492597, avg accuracy: 0.281195\n", - "epoch: 15, avg loss: 3.154245, avg accuracy: 0.332226\n", - "epoch: 16, avg loss: 2.872106, avg accuracy: 0.380268\n", - "epoch: 17, avg loss: 2.617850, avg accuracy: 0.430409\n", - "epoch: 18, avg loss: 2.333587, avg accuracy: 0.484457\n", - "epoch: 19, avg loss: 2.110324, avg accuracy: 0.524874\n", - "epoch: 20, avg loss: 1.941927, avg accuracy: 0.561457\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: và tôi cố gia , tôi sẽ thực hiện từ thời đó , nhưng nhưng thật thật ra , vì vì tôi phải phải ra , và bé có một sinh sinh sinh sinh học học phải phải phải cộng cộng cộng cộng đồng đồng cộng đồng , tôi có \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: và tôi muốn luyện tập cho bạn cách huýt gió khắp . \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi cố gia , tôi sẽ thực hiện từ thời đó , nhưng nhưng thật thật ra , vì vì tôi phải phải ra , và bé có một sinh sinh sinh sinh học học phải phải phải cộng cộng cộng cộng đồng đồng cộng đồng , tôi có \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và tôi cố sống giấc mơ của tôi -- nhưng thực ra , nhưng đó , nhưng thật ra , và quả quả quả quả của con con của của mình , chúng ta có thể nhìn vào một cụ trí và và và não và nhìn nhìn trong nhìn \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/52.bertenglish-encoder-transformer-decoder.ipynb b/neural-machine-translation/52.bertenglish-encoder-transformer-decoder.ipynb new file mode 100644 index 0000000..e24249c --- /dev/null +++ b/neural-machine-translation/52.bertenglish-encoder-transformer-decoder.ipynb @@ -0,0 +1,1301 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 1" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "import youtokentome as yttm\n", + "\n", + "bpe = yttm.BPE(model='bpe.model')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from unidecode import unidecode\n", + "from tqdm import tqdm\n", + "\n", + "def get_inputs(x, y):\n", + " input_ids, input_masks, segment_ids, ys = [], [], [], []\n", + " for i in tqdm(range(len(x))):\n", + " tokens_a = tokenizer.tokenize(unidecode(x[i]))\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " \n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " \n", + " r = bpe.encode(y[i], output_type=yttm.OutputType.ID) + [EOS]\n", + " if len([k for k in r if k == 0]):\n", + " print(y[i], i)\n", + " break\n", + " \n", + " ys.append(r)\n", + " \n", + " return input_ids, input_masks, segment_ids, ys" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 200000/200000 [01:24<00:00, 2359.12it/s]\n" + ] + } + ], + "source": [ + "train_input_ids, train_input_masks, train_segment_ids, train_Y = get_inputs(train_X, train_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 5000/5000 [00:01<00:00, 2711.31it/s]\n" + ] + } + ], + "source": [ + "test_input_ids, test_input_masks, test_segment_ids, test_Y = get_inputs(test_X, test_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "epoch = 20\n", + "batch_size = 16\n", + "warmup_proportion = 0.1\n", + "num_train_steps = len(train_input_ids)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30522" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bert_config.vocab_size" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "\n", + "BASE_PARAMS = defaultdict(\n", + " lambda: None, # Set default value to None.\n", + "\n", + " # Input params\n", + " default_batch_size=2048, # Maximum number of tokens per batch of examples.\n", + " default_batch_size_tpu=32768,\n", + " max_length=256, # Maximum number of tokens per example.\n", + "\n", + " # Model params\n", + " initializer_gain=1.0, # Used in trainable variable initialization.\n", + " vocab_size=32000, # Number of tokens defined in the vocabulary file.\n", + " hidden_size=768, # Model dimension in the hidden layers.\n", + " num_hidden_layers=6, # Number of layers in the encoder and decoder stacks.\n", + " num_heads=8, # Number of heads to use in multi-headed attention.\n", + " filter_size=2048, # Inner layer dimension in the feedforward network.\n", + "\n", + " # Dropout values (only used when training)\n", + " layer_postprocess_dropout=0.1,\n", + " attention_dropout=0.1,\n", + " relu_dropout=0.1,\n", + "\n", + " # Training params\n", + " label_smoothing=0.1,\n", + " learning_rate=1.0,\n", + " learning_rate_decay_rate=1.0,\n", + " learning_rate_warmup_steps=16000,\n", + "\n", + " # Optimizer params\n", + " optimizer_adam_beta1=0.9,\n", + " optimizer_adam_beta2=0.997,\n", + " optimizer_adam_epsilon=1e-09,\n", + "\n", + " # Default prediction params\n", + " extra_decode_length=50,\n", + " beam_size=4,\n", + " alpha=0.6, # used to calculate length normalization in beam search\n", + "\n", + " # TPU specific parameters\n", + " use_tpu=False,\n", + " static_batch=False,\n", + " allow_ffn_pad=True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "from transformer import model_utils\n", + "from transformer import utils\n", + "from transformer.transformer import DecoderStack\n", + "from transformer import beam_search\n", + "from transformer import embedding_layer\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " learning_rate = 2e-5,\n", + " training = True,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=training,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " self.decoder_stack = DecoderStack(BASE_PARAMS, training)\n", + " attention_bias = model_utils.get_padding_bias(self.X)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " \n", + " self.embedding_softmax_layer = embedding_layer.EmbeddingSharedWeights(\n", + " BASE_PARAMS[\"vocab_size\"], BASE_PARAMS[\"hidden_size\"], 'gather')\n", + " \n", + " with tf.name_scope(\"decode\"):\n", + " decoder_inputs = self.embedding_softmax_layer(self.Y)\n", + " with tf.name_scope(\"shift_targets\"):\n", + " decoder_inputs = tf.pad(\n", + " decoder_inputs, [[0, 0], [1, 0], [0, 0]])[:, :-1, :]\n", + " with tf.name_scope(\"add_pos_encoding\"):\n", + " length = tf.shape(decoder_inputs)[1]\n", + " decoder_inputs += model_utils.get_position_encoding(\n", + " length, BASE_PARAMS[\"hidden_size\"])\n", + " if training:\n", + " decoder_inputs = tf.nn.dropout(\n", + " decoder_inputs, 1 - BASE_PARAMS[\"layer_postprocess_dropout\"])\n", + " decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias(length)\n", + " outputs = self.decoder_stack(\n", + " decoder_inputs, output_layer, decoder_self_attention_bias,\n", + " attention_bias)\n", + " self.training_logits = self.embedding_softmax_layer.linear(outputs)\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " def _get_symbols_to_logits_fn(max_decode_length):\n", + " timing_signal = model_utils.get_position_encoding(\n", + " max_decode_length + 1, BASE_PARAMS[\"hidden_size\"])\n", + " decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias(\n", + " max_decode_length)\n", + " def symbols_to_logits_fn(ids, i, cache):\n", + " decoder_input = ids[:, -1:]\n", + " decoder_input = self.embedding_softmax_layer(decoder_input)\n", + " decoder_input += timing_signal[i:i + 1]\n", + " self_attention_bias = decoder_self_attention_bias[:, :, i:i + 1, :i + 1]\n", + " decoder_outputs = self.decoder_stack(\n", + " decoder_input, cache.get(\"encoder_outputs\"), self_attention_bias,\n", + " cache.get(\"encoder_decoder_attention_bias\"), cache)\n", + " logits = self.embedding_softmax_layer.linear(decoder_outputs)\n", + " logits = tf.squeeze(logits, axis=[1])\n", + " return logits, cache\n", + " return symbols_to_logits_fn\n", + " \n", + " batch_size = tf.shape(output_layer)[0]\n", + " input_length = tf.shape(output_layer)[1]\n", + " max_decode_length = input_length + BASE_PARAMS[\"extra_decode_length\"]\n", + " symbols_to_logits_fn = _get_symbols_to_logits_fn(max_decode_length)\n", + " initial_ids = tf.zeros([batch_size], dtype=tf.int32)\n", + " cache = {\n", + " \"layer_%d\" % layer: {\n", + " \"k\": tf.zeros([batch_size, 0, BASE_PARAMS[\"hidden_size\"]]),\n", + " \"v\": tf.zeros([batch_size, 0, BASE_PARAMS[\"hidden_size\"]]),\n", + " } for layer in range(BASE_PARAMS[\"num_hidden_layers\"])}\n", + " cache[\"encoder_outputs\"] = output_layer\n", + " cache[\"encoder_decoder_attention_bias\"] = attention_bias\n", + " \n", + " decoded_ids, scores = beam_search.sequence_beam_search(\n", + " symbols_to_logits_fn=symbols_to_logits_fn,\n", + " initial_ids=initial_ids,\n", + " initial_cache=cache,\n", + " vocab_size=32000,\n", + " beam_size=1,\n", + " alpha=BASE_PARAMS[\"alpha\"],\n", + " max_decode_length=max_decode_length,\n", + " eos_id=EOS)\n", + " \n", + " top_decoded_ids = decoded_ids[:, 0, 1:]\n", + " self.fast_result = top_decoded_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/translation/transformer/attention_layer.py:39: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:89: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/embedding_layer.py:48: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/model_utils.py:71: The name tf.matrix_band_part is deprecated. Please use tf.linalg.band_part instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.rsqrt is deprecated. Please use tf.math.rsqrt instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/translation/transformer/beam_search.py:420: calling reduce_logsumexp_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "keep_dims is deprecated, use keepdims instead\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if 'bert/' + name in name_to_variable:\n", + " assignment_map[name] = name_to_variable['bert/' + name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " elif name in name_to_variable:\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + " \n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "\n", + "checkpoint = BERT_INIT_CHKPNT\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "i = 0\n", + "index = min(i + batch_size, len(train_input_ids))\n", + "batch_x = train_input_ids[i: index]\n", + "batch_x = pad_sequences(batch_x, padding='post')\n", + "batch_mask = train_input_masks[i: index]\n", + "batch_mask = pad_sequences(batch_mask, padding='post')\n", + "batch_segment = train_segment_ids[i: index]\n", + "batch_segment = pad_sequences(batch_segment, padding='post')\n", + "batch_y = pad_sequences(train_Y[i: index], padding='post')\n", + "acc, cost, r = sess.run(\n", + " [model.accuracy, model.cost, model.fast_result],\n", + " feed_dict = {\n", + " model.Y: batch_y,\n", + " model.X: batch_x,\n", + " model.input_masks: batch_mask,\n", + " model.segment_ids: batch_segment\n", + " },\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 12500/12500 [44:21<00:00, 4.70it/s, accuracy=0.155, cost=6.27] \n", + "test minibatch loop: 100%|██████████| 313/313 [00:24<00:00, 12.73it/s, accuracy=0.242, cost=5.41]\n", + "train minibatch loop: 0%| | 0/12500 [00:00 1 and i not in [101, 102]])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 1 and i not in [101, 102]])" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.23225775" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/52.google-nmt.ipynb b/neural-machine-translation/52.google-nmt.ipynb deleted file mode 100644 index 4e56913..0000000 --- a/neural-machine-translation/52.google-nmt.ipynb +++ /dev/null @@ -1,528 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "from tensorflow.python.util import nest\n", - "from tensorflow.python.layers.core import Dense\n", - "\n", - "def gnmt_residual_fn(inputs, outputs):\n", - " def split_input(inp, out):\n", - " out_dim = out.get_shape().as_list()[-1]\n", - " inp_dim = inp.get_shape().as_list()[-1]\n", - " return tf.split(inp, [out_dim, inp_dim - out_dim], axis=-1)\n", - " actual_inputs, _ = nest.map_structure(split_input, inputs, outputs)\n", - "\n", - " def assert_shape_match(inp, out):\n", - " inp.get_shape().assert_is_compatible_with(out.get_shape())\n", - " nest.assert_same_structure(actual_inputs, outputs)\n", - " nest.map_structure(assert_shape_match, actual_inputs, outputs)\n", - " return nest.map_structure(lambda inp, out: inp + out, actual_inputs, outputs)\n", - "\n", - "class GNMTAttentionMultiCell(tf.nn.rnn_cell.MultiRNNCell):\n", - "\n", - " def __init__(self, attention_cell, cells, use_new_attention=True):\n", - " cells = [attention_cell] + cells\n", - " self.use_new_attention = use_new_attention\n", - " super(GNMTAttentionMultiCell, self).__init__(\n", - " cells, state_is_tuple=True)\n", - "\n", - " def __call__(self, inputs, state, scope=None):\n", - " \"\"\"Run the cell with bottom layer's attention copied to all upper layers.\"\"\"\n", - " if not nest.is_sequence(state):\n", - " raise ValueError(\n", - " \"Expected state to be a tuple of length %d, but received: %s\"\n", - " % (len(self.state_size), state))\n", - "\n", - " with tf.variable_scope(scope or \"multi_rnn_cell\"):\n", - " new_states = []\n", - "\n", - " with tf.variable_scope(\"cell_0_attention\"):\n", - " attention_cell = self._cells[0]\n", - " attention_state = state[0]\n", - " cur_inp, new_attention_state = attention_cell(\n", - " inputs, attention_state)\n", - " new_states.append(new_attention_state)\n", - "\n", - " for i in range(1, len(self._cells)):\n", - " with tf.variable_scope(\"cell_%d\" % i):\n", - " cell = self._cells[i]\n", - " cur_state = state[i]\n", - "\n", - " if self.use_new_attention:\n", - " cur_inp = tf.concat(\n", - " [cur_inp, new_attention_state.attention], -1)\n", - " else:\n", - " cur_inp = tf.concat(\n", - " [cur_inp, attention_state.attention], -1)\n", - "\n", - " cur_inp, new_state = cell(cur_inp, cur_state)\n", - " new_states.append(new_state)\n", - "\n", - " return cur_inp, tuple(new_states)\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, beam_width = 15):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " num_residual_layer = num_layers - 2\n", - " num_bi_layer = 1\n", - " num_ui_layer = num_layers - num_bi_layer\n", - "\n", - " for n in range(num_bi_layer):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layer),\n", - " cell_bw = cells(size_layer),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " gru_cells = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_ui_layer)])\n", - " encoder_outputs, encoder_state = tf.nn.dynamic_rnn(\n", - " gru_cells,\n", - " encoder_embedded,\n", - " dtype=tf.float32,\n", - " sequence_length=self.X_seq_len)\n", - " \n", - " encoder_state = (state_bw,) + (\n", - " (encoder_state,) if num_ui_layer == 1 else encoder_state)\n", - " \n", - " decoder_cells = []\n", - " for n in range(num_layers):\n", - " cell = cells(size_layer)\n", - " if (n >= num_layers - num_residual_layer):\n", - " cell = tf.nn.rnn_cell.ResidualWrapper(cell, residual_fn = gnmt_residual_fn)\n", - " decoder_cells.append(cell)\n", - " attention_cell = decoder_cells.pop(0)\n", - " to_dense = tf.layers.Dense(to_dict_size)\n", - " \n", - " with tf.variable_scope('decode'):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_outputs,\n", - " memory_sequence_length = self.X_seq_len)\n", - " att_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = attention_cell,\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = None,\n", - " alignment_history = True,\n", - " output_attention = False)\n", - " gcell = GNMTAttentionMultiCell(att_cell, decoder_cells)\n", - " \n", - " self.initial_state = tuple(\n", - " zs.clone(cell_state=es)\n", - " if isinstance(zs, tf.contrib.seq2seq.AttentionWrapperState) else es\n", - " for zs, es in zip(\n", - " gcell.zero_state(batch_size, dtype=tf.float32), encoder_state))\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " decoder_embedded,\n", - " self.Y_seq_len,\n", - " time_major = False\n", - " )\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = gcell,\n", - " helper = training_helper,\n", - " initial_state = self.initial_state,\n", - " output_layer = to_dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " with tf.variable_scope('decode', reuse=True):\n", - " encoder_out_tiled = tf.contrib.seq2seq.tile_batch(encoder_outputs, beam_width)\n", - " encoder_state_tiled = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width)\n", - " X_seq_len_tiled = tf.contrib.seq2seq.tile_batch(self.X_seq_len, beam_width)\n", - " \n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", - " num_units = size_layer, \n", - " memory = encoder_out_tiled,\n", - " memory_sequence_length = X_seq_len_tiled)\n", - " att_cell = tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = attention_cell,\n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = None,\n", - " alignment_history = False,\n", - " output_attention = False)\n", - " gcell = GNMTAttentionMultiCell(att_cell, decoder_cells)\n", - " \n", - " self.initial_state = tuple(\n", - " zs.clone(cell_state=es)\n", - " if isinstance(zs, tf.contrib.seq2seq.AttentionWrapperState) else es\n", - " for zs, es in zip(\n", - " gcell.zero_state(batch_size * beam_width, dtype=tf.float32), encoder_state_tiled))\n", - " \n", - " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", - " cell = gcell,\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS,\n", - " initial_state = self.initial_state,\n", - " beam_width = beam_width,\n", - " output_layer = to_dense,\n", - " length_penalty_weight = 0.0)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = False,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.predicted_ids[:, :, 0]\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.548914, avg accuracy: 0.050564\n", - "epoch: 2, avg loss: 6.038840, avg accuracy: 0.083646\n", - "epoch: 3, avg loss: 5.889536, avg accuracy: 0.107929\n", - "epoch: 4, avg loss: 5.777255, avg accuracy: 0.116488\n", - "epoch: 5, avg loss: 5.656294, avg accuracy: 0.129913\n", - "epoch: 6, avg loss: 5.469441, avg accuracy: 0.143000\n", - "epoch: 7, avg loss: 5.211525, avg accuracy: 0.166089\n", - "epoch: 8, avg loss: 4.946451, avg accuracy: 0.186357\n", - "epoch: 9, avg loss: 4.646811, avg accuracy: 0.206599\n", - "epoch: 10, avg loss: 4.357532, avg accuracy: 0.233228\n", - "epoch: 11, avg loss: 4.068010, avg accuracy: 0.261730\n", - "epoch: 12, avg loss: 3.767738, avg accuracy: 0.296694\n", - "epoch: 13, avg loss: 3.490741, avg accuracy: 0.335230\n", - "epoch: 14, avg loss: 3.218014, avg accuracy: 0.376315\n", - "epoch: 15, avg loss: 2.915447, avg accuracy: 0.438789\n", - "epoch: 16, avg loss: 2.652851, avg accuracy: 0.481870\n", - "epoch: 17, avg loss: 2.396472, avg accuracy: 0.532073\n", - "epoch: 18, avg loss: 2.145722, avg accuracy: 0.581913\n", - "epoch: 19, avg loss: 1.923750, avg accuracy: 0.628581\n", - "epoch: 20, avg loss: 1.705201, avg accuracy: 0.675990\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(X), batch_size):\n", - " index = min(k + batch_size, len(X))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD)\n", - " predicted, accuracy, loss, _ = sess.run([model.predicting_ids,\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(X) / batch_size)\n", - " total_accuracy /= (len(X) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: so we were pretty reassured by this .\n", - "REAL ANSWER: nên chúng tôi khá an tâm .\n", - "PREDICTED ANSWER: nên chúng chúng rất chúng thực . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . \n", - "\n", - "row 2\n", - "QUESTION: it 's a lot of water-based materials like concrete , water-based paint , mud , and also some refined oils as well .\n", - "REAL ANSWER: còn rất nhiều chất có nước như bê tông , sơn có chứa nước , bùn , và một số loại dầu tinh chế nữa .\n", - "PREDICTED ANSWER: như rất , như chống , như chống , như rất , rất , nó tông chống , \n", - "\n", - "row 3\n", - "QUESTION: so the effect that this stroke could have on mario 's body could be the fact that he couldn 't be able to control the left side of his body .\n", - "REAL ANSWER: hậu quả của cú đột quỵ đối với cơ thể của mario có thể tệ đến mức mario sẽ không còn có thể sử dụng được phần cơ thể bên trái nữa .\n", - "PREDICTED ANSWER: hậu quỵ , này bạn ra ra , nào ra , nào ra , nào ra , nào ra , nào ra , nào ra , nào ra , nào ra , nào ra , này . là , , nào ra , \n", - "\n", - "row 4\n", - "QUESTION: rachel pike : the science behind a climate headline\n", - "REAL ANSWER: khoa học đằng sau một tiêu đề về khí hậu\n", - "PREDICTED ANSWER: khoa khoa thuật nào ra , \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/53.dilated-seq2seq.ipynb b/neural-machine-translation/53.dilated-seq2seq.ipynb deleted file mode 100644 index d7d99af..0000000 --- a/neural-machine-translation/53.dilated-seq2seq.ipynb +++ /dev/null @@ -1,554 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def embed_seq(x, vocab_sz, embed_dim, name, zero_pad=True): \n", - " embedding = tf.get_variable(name, [vocab_sz, embed_dim]) \n", - " if zero_pad:\n", - " embedding = tf.concat([tf.zeros([1, embed_dim]), embedding[1:, :]], 0) \n", - " x = tf.nn.embedding_lookup(embedding, x)\n", - " return x\n", - "\n", - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " from_dict_size, to_dict_size, learning_rate, \n", - " kernel_size = 2, n_attn_heads = 16):\n", - "\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - "\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " def forward(x, y, reuse=False):\n", - " with tf.variable_scope('forward',reuse=reuse):\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embedding, x)\n", - " decoder_embedded = tf.nn.embedding_lookup(decoder_embedding, y)\n", - " \n", - " encoder_embedded += position_encoding(encoder_embedded)\n", - " for i in range(num_layers): \n", - " dilation_rate = 2 ** i\n", - " pad_sz = (kernel_size - 1) * dilation_rate \n", - " with tf.variable_scope('block_%d'%i,reuse=reuse):\n", - " encoder_embedded += cnn_block(encoder_embedded, dilation_rate, \n", - " pad_sz, size_layer, kernel_size)\n", - " \n", - " g = tf.identity(decoder_embedded)\n", - " for i in range(num_layers):\n", - " dilation_rate = 2 ** i\n", - " pad_sz = (kernel_size - 1) * dilation_rate\n", - " with tf.variable_scope('decode_%d'%i,reuse=reuse):\n", - " attn_res = h = cnn_block(decoder_embedded, dilation_rate, \n", - " pad_sz, size_layer, kernel_size)\n", - " C = []\n", - " for j in range(n_attn_heads):\n", - " h_ = tf.layers.dense(h, size_layer//n_attn_heads)\n", - " g_ = tf.layers.dense(g, size_layer//n_attn_heads)\n", - " zu_ = tf.layers.dense(encoder_embedded, size_layer//n_attn_heads)\n", - " ze_ = tf.layers.dense(encoder_embedded, size_layer//n_attn_heads)\n", - "\n", - " d = tf.layers.dense(h_, size_layer//n_attn_heads) + g_\n", - " dz = tf.matmul(d, tf.transpose(zu_, [0, 2, 1]))\n", - " a = tf.nn.softmax(dz)\n", - " c_ = tf.matmul(a, ze_)\n", - " C.append(c_)\n", - " \n", - " c = tf.concat(C, 2)\n", - " h = tf.layers.dense(attn_res + c, size_layer)\n", - " decoder_embedded += h\n", - " return tf.layers.dense(decoder_embedded, to_dict_size)\n", - " \n", - " self.training_logits = forward(self.X, decoder_input)\n", - " self.logits = forward(self.X, self.Y, reuse=True)\n", - " self.k = tf.placeholder(dtype = tf.int32)\n", - " p = tf.nn.softmax(self.logits)\n", - " self.topk_logprobs, self.topk_ids = tf.nn.top_k(tf.log(p), self.k)\n", - "\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) " - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 1e-3\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, \n", - " len(dictionary_from), len(dictionary_to), learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 7.138729, avg accuracy: 0.060734\n", - "epoch: 2, avg loss: 5.717680, avg accuracy: 0.115421\n", - "epoch: 3, avg loss: 5.065214, avg accuracy: 0.167811\n", - "epoch: 4, avg loss: 4.467808, avg accuracy: 0.229607\n", - "epoch: 5, avg loss: 3.844215, avg accuracy: 0.298709\n", - "epoch: 6, avg loss: 3.283214, avg accuracy: 0.371743\n", - "epoch: 7, avg loss: 2.696358, avg accuracy: 0.473615\n", - "epoch: 8, avg loss: 2.230349, avg accuracy: 0.554017\n", - "epoch: 9, avg loss: 1.778437, avg accuracy: 0.643466\n", - "epoch: 10, avg loss: 1.458658, avg accuracy: 0.694569\n", - "epoch: 11, avg loss: 1.092220, avg accuracy: 0.770734\n", - "epoch: 12, avg loss: 0.799111, avg accuracy: 0.839734\n", - "epoch: 13, avg loss: 0.606233, avg accuracy: 0.883996\n", - "epoch: 14, avg loss: 0.413571, avg accuracy: 0.931439\n", - "epoch: 15, avg loss: 0.257403, avg accuracy: 0.970123\n", - "epoch: 16, avg loss: 0.147818, avg accuracy: 0.999952\n", - "epoch: 17, avg loss: 0.072431, avg accuracy: 1.013340\n", - "epoch: 18, avg loss: 0.033687, avg accuracy: 1.021390\n", - "epoch: 19, avg loss: 0.016774, avg accuracy: 1.023093\n", - "epoch: 20, avg loss: 0.009440, avg accuracy: 1.023615\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " accuracy,loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "class Hypothesis:\n", - " def __init__(self, log_prob, seq):\n", - " self.log_prob = log_prob\n", - " self.seq = seq\n", - "\n", - " @property\n", - " def step(self):\n", - " return len(self.seq) - 1\n", - "\n", - "\n", - "def beam_search(\n", - " batch_x,\n", - " beam_size,\n", - " num_ans = 50,\n", - " normalize_by_len = 1.0,\n", - "):\n", - " assert 0 <= normalize_by_len <= 1\n", - " batch_size = len(batch_x)\n", - " max_len = len(batch_x[0]) * 2\n", - " dec_inputs = np.ones((batch_size, 2), dtype=np.int32)\n", - " answers = [[] for i in range(batch_size)]\n", - " H = [[] for i in range(batch_size)]\n", - " \n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " for i in range(batch_size):\n", - " for j, log_prob in enumerate(tkl[i, 0]):\n", - " if tkid[i, 0, j] != EOS:\n", - " h = Hypothesis(log_prob, [1, tkid[i, 0, j]])\n", - " H[i].append(h)\n", - " H[i].sort(key=lambda h: h.log_prob)\n", - " \n", - " done = [False] * batch_size\n", - " while not all(done):\n", - " tkl_beam = []\n", - " tkid_beam = []\n", - " dec_inputs_beam = []\n", - " steps_beam = []\n", - " for i in range(beam_size):\n", - " steps = [1] * batch_size\n", - " prev_log_probs = np.zeros(batch_size, dtype=np.float32)\n", - " dec_inputs = np.ones((batch_size, max_len), dtype=np.int32)\n", - " for j, h in enumerate(H):\n", - " while h:\n", - " hi = h.pop()\n", - " lp, step, candidate_seq = hi.log_prob, hi.step, hi.seq\n", - " if candidate_seq[-1] != EOS:\n", - " dec_inputs[j, :len(candidate_seq)] = candidate_seq\n", - " steps[j] = step\n", - " prev_log_probs[j] = lp\n", - " break\n", - " else:\n", - " answers[j].append((lp, candidate_seq))\n", - " max_step = max(steps)\n", - " dec_inputs = dec_inputs[:, :max_step + 2]\n", - " tkl, tkid = sess.run([model.topk_logprobs, \n", - " model.topk_ids],\n", - " feed_dict = {model.X: batch_x,\n", - " model.Y: dec_inputs,\n", - " model.k: beam_size})\n", - " tkl_beam.append(tkl + prev_log_probs[:, None, None])\n", - " tkid_beam.append(tkid)\n", - " dec_inputs_beam.append(dec_inputs.copy())\n", - " steps_beam.append(steps)\n", - " \n", - " for i in range(beam_size):\n", - " tkl = tkl_beam[i]\n", - " tkid = tkid_beam[i]\n", - " dec_inputs = dec_inputs_beam[i]\n", - " steps = steps_beam[i]\n", - " for j in range(batch_size):\n", - " step = steps[j]\n", - " for k in range(tkid.shape[2]):\n", - " extended_seq = np.hstack((dec_inputs[j, :step+1], [tkid[j, step, k]]))\n", - " log_prob = tkl[j, step, k]\n", - " if len(extended_seq) <= max_len and log_prob > -10:\n", - " h = Hypothesis(log_prob, extended_seq)\n", - " H[j].append(h)\n", - " H[j].sort(key=lambda h: h.log_prob / (h.step**normalize_by_len))\n", - " \n", - " for i in range(batch_size):\n", - " done[i] = (len(answers[i]) >= num_ans) or (not H[i]) or (len(H[i]) > 100)\n", - " \n", - " return answers" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "beamed = beam_search(batch_x, 5)" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "beamed = [i for i in beamed if len(i)]" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "predicted = [max(b, key = lambda t: t[0])[1] for b in beamed]" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(predicted)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/53.transformer-t2t-2gpu.ipynb b/neural-machine-translation/53.transformer-t2t-2gpu.ipynb new file mode 100644 index 0000000..e28e49f --- /dev/null +++ b/neural-machine-translation/53.transformer-t2t-2gpu.ipynb @@ -0,0 +1,2228 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1,2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.data_generators import problem\n", + "from tensor2tensor.data_generators import text_problems\n", + "from tensor2tensor.data_generators import translate\n", + "from tensor2tensor.utils import registry" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "os.system('rm -rf t2t')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "TRAIN_DATASETS = [\n", + " [\n", + " \"https://f000.backblazeb2.com/file/malay-dataset/train-translation.tar.gz\", \n", + " (\"train/before.txt\",\n", + " \"train/after.txt\")\n", + " ]\n", + "]\n", + "\n", + "TEST_DATASETS = [\n", + " [\n", + " \"https://f000.backblazeb2.com/file/malay-dataset/test-translation.tar.gz\", \n", + " (\"test/before.txt\",\n", + " \"test/after.txt\")\n", + " ]\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "@registry.register_problem\n", + "class Paraphrase32k(translate.TranslateProblem):\n", + " \"\"\"En-de translation trained on WMT corpus.\"\"\"\n", + "\n", + " @property\n", + " def additional_training_datasets(self):\n", + " \"\"\"Allow subclasses to add training datasets.\"\"\"\n", + " return []\n", + "\n", + " def source_data_files(self, dataset_split):\n", + " train = dataset_split == problem.DatasetSplit.TRAIN\n", + " train_datasets = TRAIN_DATASETS + self.additional_training_datasets\n", + " return train_datasets if train else TEST_DATASETS" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import registry\n", + "from tensor2tensor import problems" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import os\n", + "\n", + "DATA_DIR = os.path.expanduser(\"t2t/data\")\n", + "TMP_DIR = os.path.expanduser(\"t2t/tmp\")\n", + "TRAIN_DIR = os.path.expanduser(\"t2t/train\")\n", + "EXPORT_DIR = os.path.expanduser(\"t2t/export\")\n", + "TRANSLATIONS_DIR = os.path.expanduser(\"t2t/translation\")\n", + "EVENT_DIR = os.path.expanduser(\"t2t/event\")\n", + "USR_DIR = os.path.expanduser(\"t2t/user\")\n", + " \n", + "tf.gfile.MakeDirs(DATA_DIR)\n", + "tf.gfile.MakeDirs(TMP_DIR)\n", + "tf.gfile.MakeDirs(TRAIN_DIR)\n", + "tf.gfile.MakeDirs(EXPORT_DIR)\n", + "tf.gfile.MakeDirs(TRANSLATIONS_DIR)\n", + "tf.gfile.MakeDirs(EVENT_DIR)\n", + "tf.gfile.MakeDirs(USR_DIR)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/translate.py:169: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/translate.py:169: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/translate.py:173: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/translate.py:173: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Downloading https://f000.backblazeb2.com/file/malay-dataset/train-translation.tar.gz to t2t/tmp/train-translation.tar.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Downloading https://f000.backblazeb2.com/file/malay-dataset/train-translation.tar.gz to t2t/tmp/train-translation.tar.gz\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100% completed\n", + "INFO:tensorflow:Successfully downloaded train-translation.tar.gz, 17167428 bytes.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Successfully downloaded train-translation.tar.gz, 17167428 bytes.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating vocab file: t2t/data/vocab.paraphrase32k.32768.subwords\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating vocab file: t2t/data/vocab.paraphrase32k.32768.subwords\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating vocab from: [['https://f000.backblazeb2.com/file/malay-dataset/train-translation.tar.gz', ('train/before.txt', 'train/after.txt')]]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating vocab from: [['https://f000.backblazeb2.com/file/malay-dataset/train-translation.tar.gz', ('train/before.txt', 'train/after.txt')]]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Not downloading, file already found: t2t/tmp/train-translation.tar.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Not downloading, file already found: t2t/tmp/train-translation.tar.gz\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Reading file: train/before.txt\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Reading file: train/before.txt\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Reading file: train/after.txt\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Reading file: train/after.txt\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 500\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 500\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1179\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1179\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 661\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 661\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 716\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 716\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 701\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 701\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 250\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 250\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2109\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2109\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1046\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1046\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1112\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1112\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1107\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1107\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 125\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 125\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 3802\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 3802\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1710\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1710\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1811\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1811\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1774\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 1774\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 62\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 62\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 6786\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 6786\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2753\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2753\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2871\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2871\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2853\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 2853\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 31\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 31\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11384\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11384\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4320\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4320\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4476\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4476\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4459\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 4459\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 15\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 15\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 18660\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 18660\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 6853\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 6853\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 7074\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 7074\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 7034\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 7034\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 7\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 7\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 30380\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 30380\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 10889\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 10889\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11162\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11162\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11128\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 11128\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 50058\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 50058\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 17713\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 17713\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 18027\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 18027\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 17971\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 17971\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Trying min_count 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 85772\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 85772\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 1\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 2\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Iteration 3\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:vocab_size = 28026\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 0.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 0.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 100000.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 100000.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generated 200000 Examples\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generated 200000 Examples\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Downloading https://f000.backblazeb2.com/file/malay-dataset/test-translation.tar.gz to t2t/tmp/test-translation.tar.gz\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Downloading https://f000.backblazeb2.com/file/malay-dataset/test-translation.tar.gz to t2t/tmp/test-translation.tar.gz\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "101% completed\n", + "INFO:tensorflow:Successfully downloaded test-translation.tar.gz, 428741 bytes.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Successfully downloaded test-translation.tar.gz, 428741 bytes.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Found vocab file: t2t/data/vocab.paraphrase32k.32768.subwords\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Found vocab file: t2t/data/vocab.paraphrase32k.32768.subwords\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 0.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generating case 0.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generated 5000 Examples\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Generated 5000 Examples\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Shuffling data...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Shuffling data...\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/generator_utils.py:477: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use eager execution and: \n", + "`tf.data.TFRecordDataset(path)`\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/data_generators/generator_utils.py:477: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use eager execution and: \n", + "`tf.data.TFRecordDataset(path)`\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Data shuffled.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Data shuffled.\n" + ] + } + ], + "source": [ + "PROBLEM = 'paraphrase32k'\n", + "t2t_problem = problems.problem(PROBLEM)\n", + "t2t_problem.generate_data(DATA_DIR, TMP_DIR)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "train_steps = 50000\n", + "eval_steps = 10\n", + "batch_size = 512 * 2\n", + "save_checkpoints_steps = 10000\n", + "ALPHA = 0.1\n", + "schedule = \"continuous_train_and_eval\"\n", + "MODEL = \"transformer\"\n", + "HPARAMS = \"transformer_base\"" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/optimize.py:187: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/optimize.py:187: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/trainer_lib.py:111: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/trainer_lib.py:111: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/models/research/neural_stack.py:52: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/models/research/neural_stack.py:52: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.\n", + "\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_gan/python/estimator/tpu_gan_estimator.py:42: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.\n", + "\n" + ] + } + ], + "source": [ + "from tensor2tensor.utils.trainer_lib import create_run_config, create_experiment\n", + "from tensor2tensor.utils.trainer_lib import create_hparams\n", + "from tensor2tensor.utils import registry\n", + "from tensor2tensor import models\n", + "from tensor2tensor import problems\n", + "\n", + "hparams = create_hparams(HPARAMS)\n", + "hparams.batch_size = batch_size\n", + "hparams.learning_rate = ALPHA" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "RUN_CONFIG = create_run_config(\n", + " model_dir=TRAIN_DIR,\n", + " model_name=MODEL,\n", + " save_checkpoints_steps= save_checkpoints_steps,\n", + " num_gpus=2\n", + ")\n", + "\n", + "tensorflow_exp_fn = create_experiment(\n", + " run_config=RUN_CONFIG,\n", + " hparams=hparams,\n", + " model_name=MODEL,\n", + " problem_name=PROBLEM,\n", + " data_dir=DATA_DIR, \n", + " train_steps=train_steps, \n", + " eval_steps=eval_steps, \n", + " #use_xla=True # For acceleration\n", + " ) \n", + "\n", + "tensorflow_exp_fn.train_and_evaluate()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "('t2t/data/vocab.paraphrase32k.32768.subwords', 't2t/train/model.ckpt-50000')" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "vocab_file = \"t2t/data/vocab.paraphrase32k.32768.subwords\"\n", + "ckpt_path = tf.train.latest_checkpoint(os.path.join('t2t/train'))\n", + "vocab_file, ckpt_path" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from t import text_encoder\n", + "\n", + "encoder = text_encoder.SubwordTextEncoder(vocab_file)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor import models\n", + "from tensor2tensor import problems\n", + "from tensor2tensor.layers import common_layers\n", + "from tensor2tensor.utils import trainer_lib\n", + "from tensor2tensor.utils import t2t_model\n", + "from tensor2tensor.utils import registry\n", + "from tensor2tensor.utils import metrics" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'inputs': , 'targets': , 'target_space_id': }\n", + "INFO:tensorflow:Setting T2TModel mode to 'infer'\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n", + "INFO:tensorflow:Setting T2TModel mode to 'infer'\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.dropout to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.dropout to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.label_smoothing to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.label_smoothing to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.symbol_dropout to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.symbol_dropout to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.attention_dropout to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.attention_dropout to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.relu_dropout to 0.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Setting hparams.relu_dropout to 0.0\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using variable initializer: uniform_unit_scaling\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using variable initializer: uniform_unit_scaling\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_28026_512.bottom\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_28026_512.bottom\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'targets' with symbol_modality_28026_512.targets_bottom\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'targets' with symbol_modality_28026_512.targets_bottom\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Building model body\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Building model body\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming body output with symbol_modality_28026_512.top\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming body output with symbol_modality_28026_512.top\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/models/transformer.py:1226: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/models/transformer.py:1226: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using variable initializer: uniform_unit_scaling\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using variable initializer: uniform_unit_scaling\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_28026_512.bottom\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'inputs' with symbol_modality_28026_512.bottom\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'targets' with symbol_modality_28026_512.targets_bottom\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming feature 'targets' with symbol_modality_28026_512.targets_bottom\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Building model body\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Building model body\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming body output with symbol_modality_28026_512.top\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Transforming body output with symbol_modality_28026_512.top\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from t2t/train/model.ckpt-50000\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from t2t/train/model.ckpt-50000\n" + ] + } + ], + "source": [ + "class Model:\n", + " def __init__(self, HPARAMS = \"transformer_base\", DATA_DIR = 't2t/data'):\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", + " maxlen_decode = tf.reduce_max(self.X_seq_len) + 50\n", + " \n", + " x = tf.expand_dims(tf.expand_dims(self.X, -1), -1)\n", + " y = tf.expand_dims(tf.expand_dims(self.Y, -1), -1)\n", + " \n", + " features = {\n", + " \"inputs\": x,\n", + " \"targets\": y,\n", + " \"target_space_id\": tf.constant(1, dtype=tf.int32),\n", + " }\n", + " print(features)\n", + " \n", + " Modes = tf.estimator.ModeKeys\n", + " hparams = trainer_lib.create_hparams(HPARAMS, data_dir=DATA_DIR, problem_name=PROBLEM)\n", + " translate_model = registry.model('transformer')(hparams, Modes.PREDICT)\n", + " logits, _ = translate_model(features)\n", + " \n", + " with tf.variable_scope(tf.get_variable_scope(), reuse=True):\n", + " self.fast_result = translate_model._greedy_infer(features, maxlen_decode)[\"outputs\"]\n", + " self.beam_result = translate_model._beam_decode_slow(\n", + " features, maxlen_decode, beam_size=5, \n", + " top_beams=1, alpha=1.0)[\"outputs\"]\n", + " \n", + " self.fast_result = tf.identity(self.fast_result, name = 'greedy')\n", + " self.beam_result = tf.identity(self.beam_result, name = 'beam')\n", + " \n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, ckpt_path)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "encoded = encoder.encode(test_X[0]) + [1]\n", + "r = sess.run(model.fast_result, feed_dict = {model.X: [encoded]})" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"La Convention de La Haye du 29 mai 1993 sur la protection des enfants et la coopération en matière d'adoption par pays sont réglementées par la Convention de La Haye et par les lois nationales.\"" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "encoder.decode(r[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Ce domaine est régi par la Convention de La Haye du 29 mai 1993 sur la protection des enfants et la coopération en matière d'adoption internationale, ainsi que par les législations nationales.\"" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_Y[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import bleu_hook\n", + "\n", + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences\n", + "batch_size = 32" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 157/157 [05:17<00:00, 2.02s/it]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "results = []\n", + "for i in tqdm(range(0, len(test_X), batch_size)):\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i: index]\n", + " batch_x = [encoder.encode(r) + [1] for r in batch_x]\n", + " batch_x = pad_sequences(batch_x, padding='post')\n", + " p = sess.run(model.fast_result,feed_dict = {model.X: batch_x})\n", + " result = []\n", + " for row in p:\n", + " result.append([i for i in row if i > 1])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append(encoder.encode(r))" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.36773485" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/54.bert-lstm-luong.ipynb b/neural-machine-translation/54.bert-lstm-luong.ipynb deleted file mode 100644 index 5e97806..0000000 --- a/neural-machine-translation/54.bert-lstm-luong.ipynb +++ /dev/null @@ -1,771 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", - "# !unzip uncased_L-12_H-768_A-12.zip" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", - "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", - "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import bert\n", - "from bert import run_classifier\n", - "from bert import optimization\n", - "from bert import tokenization\n", - "from bert import modeling\n", - "import tensorflow as tf" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "tokenization.validate_case_matches_checkpoint(True,BERT_INIT_CHKPNT)\n", - "tokenizer = tokenization.FullTokenizer(\n", - " vocab_file=BERT_VOCAB, do_lower_case=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/neural-machine-translation/english-train\n", - "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/neural-machine-translation/vietnam-train" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "import collections\n", - "\n", - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_to['GO']\n", - "PAD = dictionary_to['PAD']\n", - "EOS = dictionary_to['EOS']\n", - "UNK = dictionary_to['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 500/500 [00:00<00:00, 2828.55it/s]\n" - ] - } - ], - "source": [ - "MAX_SEQ_LENGTH = 200\n", - "\n", - "from tqdm import tqdm\n", - "\n", - "input_ids, input_masks, segment_ids = [], [], []\n", - "\n", - "for text in tqdm(text_from):\n", - " tokens_a = tokenizer.tokenize(text)\n", - " if len(tokens_a) > MAX_SEQ_LENGTH - 2:\n", - " tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n", - " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", - " segment_id = [0] * len(tokens)\n", - " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", - " input_mask = [1] * len(input_id)\n", - " padding = [0] * (MAX_SEQ_LENGTH - len(input_id))\n", - " input_id += padding\n", - " input_mask += padding\n", - " segment_id += padding\n", - " \n", - " input_ids.append(input_id)\n", - " input_masks.append(input_mask)\n", - " segment_ids.append(segment_id)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", - "epoch = 20\n", - "batch_size = 10\n", - "warmup_proportion = 0.1\n", - "num_train_steps = int(len(input_ids) / batch_size * epoch)\n", - "num_warmup_steps = int(num_train_steps * warmup_proportion)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " to_dict_size, learning_rate, dropout = 0.5):\n", - " \n", - " def gru_cell(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer, reuse=reuse)\n", - " \n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layer, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([gru_cell(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layer)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", - " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " model = modeling.BertModel(\n", - " config=bert_config,\n", - " is_training=True,\n", - " input_ids=self.X,\n", - " input_mask=self.input_masks,\n", - " token_type_ids=self.segment_ids,\n", - " use_one_hot_embeddings=False)\n", - " \n", - " self.encoder_out = model.get_sequence_output()\n", - " self.encoder_state = tf.layers.dense(model.get_pooled_output(), size_layer)\n", - " self.encoder_state = tuple(self.encoder_state for _ in range(num_layers))\n", - " \n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " decoder_cell = attention(self.encoder_out, self.X_seq_len)\n", - " dense_layer = tf.layers.Dense(to_dict_size)\n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embeddings,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = predicting_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=self.encoder_state),\n", - " output_layer = dense_layer)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " \n", - " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", - " num_train_steps, num_warmup_steps, False)\n", - " \n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 2e-5" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, \n", - " len(dictionary_to), learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" - ] - } - ], - "source": [ - "sess.run(tf.global_variables_initializer())\n", - "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", - "saver = tf.train.Saver(var_list = var_lists)\n", - "saver.restore(sess, BERT_INIT_CHKPNT)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k, 2))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 50/50 [01:01<00:00, 1.20s/it, accuracy=0.0389, cost=6.67] \n", - "train minibatch loop: 0%| | 0/50 [00:00= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_to['GO']\n", - "PAD = dictionary_to['PAD']\n", - "EOS = dictionary_to['EOS']\n", - "UNK = dictionary_to['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 500/500 [00:00<00:00, 2757.07it/s]\n" - ] - } - ], - "source": [ - "MAX_SEQ_LENGTH = 200\n", - "\n", - "from tqdm import tqdm\n", - "\n", - "input_ids, input_masks, segment_ids = [], [], []\n", - "\n", - "for text in tqdm(text_from):\n", - " tokens_a = tokenizer.tokenize(text)\n", - " if len(tokens_a) > MAX_SEQ_LENGTH - 2:\n", - " tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n", - " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", - " segment_id = [0] * len(tokens)\n", - " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", - " input_mask = [1] * len(input_id)\n", - " padding = [0] * (MAX_SEQ_LENGTH - len(input_id))\n", - " input_id += padding\n", - " input_mask += padding\n", - " segment_id += padding\n", - " \n", - " input_ids.append(input_id)\n", - " input_masks.append(input_mask)\n", - " segment_ids.append(segment_id)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", - "epoch = 20\n", - "batch_size = 10\n", - "warmup_proportion = 0.1\n", - "num_train_steps = int(len(input_ids) / batch_size * epoch)\n", - "num_warmup_steps = int(num_train_steps * warmup_proportion)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def embed_seq(x, vocab_sz, embed_dim, name, zero_pad=True): \n", - " embedding = tf.get_variable(name, [vocab_sz, embed_dim]) \n", - " if zero_pad:\n", - " embedding = tf.concat([tf.zeros([1, embed_dim]), embedding[1:, :]], 0) \n", - " x = tf.nn.embedding_lookup(embedding, x)\n", - " return x\n", - "\n", - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size, \n", - " to_dict_size, learning_rate, kernel_size = 2, n_attn_heads = 16):\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", - " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " self.embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " self.model = modeling.BertModel(\n", - " config=bert_config,\n", - " is_training=True,\n", - " input_ids=self.X,\n", - " input_mask=self.input_masks,\n", - " token_type_ids=self.segment_ids,\n", - " use_one_hot_embeddings=False)\n", - " \n", - " self.num_layers = num_layers\n", - " self.kernel_size = kernel_size\n", - " self.size_layer = size_layer\n", - " self.n_attn_heads = n_attn_heads\n", - " self.dict_size = to_dict_size\n", - " \n", - " self.training_logits = self.forward(self.X, decoder_input)\n", - "\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", - " \n", - " def forward(self, x, y, reuse = False):\n", - " with tf.variable_scope('forward',reuse=reuse):\n", - " with tf.variable_scope('forward',reuse=reuse):\n", - " encoder_embedded = self.model.get_sequence_output()\n", - " decoder_embedded = tf.nn.embedding_lookup(self.embedding, y)\n", - " \n", - " g = tf.identity(decoder_embedded)\n", - " for i in range(self.num_layers):\n", - " dilation_rate = 2 ** i\n", - " pad_sz = (self.kernel_size - 1) * dilation_rate\n", - " with tf.variable_scope('decode_%d'%i,reuse=reuse):\n", - " attn_res = h = cnn_block(decoder_embedded, dilation_rate, \n", - " pad_sz, self.size_layer, self.kernel_size)\n", - " C = []\n", - " for j in range(self.n_attn_heads):\n", - " h_ = tf.layers.dense(h, self.size_layer//self.n_attn_heads)\n", - " g_ = tf.layers.dense(g, self.size_layer//self.n_attn_heads)\n", - " zu_ = tf.layers.dense(encoder_embedded, self.size_layer//self.n_attn_heads)\n", - " ze_ = tf.layers.dense(encoder_embedded, self.size_layer//self.n_attn_heads)\n", - "\n", - " d = tf.layers.dense(h_, self.size_layer//self.n_attn_heads) + g_\n", - " dz = tf.matmul(d, tf.transpose(zu_, [0, 2, 1]))\n", - " a = tf.nn.softmax(dz)\n", - " c_ = tf.matmul(a, ze_)\n", - " C.append(c_)\n", - "\n", - " c = tf.concat(C, 2)\n", - " h = tf.layers.dense(attn_res + c, self.size_layer)\n", - " decoder_embedded += h\n", - "\n", - " return tf.layers.dense(decoder_embedded, self.dict_size)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 2e-5\n", - "batch_size = 8\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, \n", - " len(dictionary_to), learning_rate)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" - ] - } - ], - "source": [ - "sess.run(tf.global_variables_initializer())\n", - "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", - "saver = tf.train.Saver(var_list = var_lists)\n", - "saver.restore(sess, BERT_INIT_CHKPNT)" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k, 2))\n", - " X.append(ints)\n", - " return X\n", - "\n", - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 63/63 [01:07<00:00, 1.10it/s, accuracy=0.0811, cost=7.08]\n", - "train minibatch loop: 0%| | 0/63 [00:00:11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :23: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :26: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[ 9474, 9474, 8026, 8026, 11354, 11354, 30856, 19043, 25768,\n", + " 25768, 5578, 30394, 30394, 30394, 30394, 30394, 31105, 31105,\n", + " 5803, 1253, 1253, 1253, 29253, 29253, 29253, 29253, 29253,\n", + " 10336, 10336, 6833, 6833, 5621, 5621, 5621, 5621, 5621,\n", + " 5621, 5621, 14544, 14544, 7972, 7972, 7972, 7972, 19354,\n", + " 19354, 19354, 19354, 22886, 28105, 7491, 7491, 7491, 7491,\n", + " 2215, 2215, 1830, 1830, 1830, 27806, 1664, 1664, 1664,\n", + " 7601, 7601, 7601, 6772, 6772, 31802, 31802, 31802, 26975],\n", + " [19758, 2395, 11639, 11639, 3554, 14988, 14988, 16578, 16578,\n", + " 28519, 28519, 29929, 29929, 8231, 8231, 19925, 3138, 3138,\n", + " 26914, 26338, 27344, 27344, 27344, 27344, 27344, 13816, 15748,\n", + " 15748, 15748, 16686, 16095, 16095, 16095, 16095, 4248, 24085,\n", + " 24085, 5160, 24085, 24085, 5866, 6372, 6372, 6372, 6372,\n", + " 935, 26480, 26480, 26480, 26480, 2037, 21200, 155, 155,\n", + " 13017, 13017, 13017, 6820, 233, 6820, 6820, 31387, 14536,\n", + " 10176, 10176, 10176, 10176, 30230, 31099, 30230, 31099, 25218],\n", + " [ 4447, 4447, 417, 25156, 25156, 25156, 25156, 25156, 25156,\n", + " 2756, 25156, 3504, 25957, 25957, 4286, 23244, 23244, 23244,\n", + " 1036, 638, 22027, 25615, 25615, 15029, 25615, 2538, 20524,\n", + " 883, 883, 4898, 4898, 4898, 6984, 30983, 16765, 16765,\n", + " 16765, 16765, 28003, 7360, 7360, 23123, 23123, 18384, 27342,\n", + " 27342, 27342, 20580, 20580, 20580, 13977, 13977, 13977, 13977,\n", + " 28319, 28319, 17141, 17141, 17141, 15452, 9311, 9311, 9311,\n", + " 24181, 24181, 25791, 25791, 7274, 7274, 4516, 10100, 10100],\n", + " [26050, 16590, 16590, 16590, 16590, 5385, 5385, 5385, 172,\n", + " 15661, 15661, 17136, 22151, 22151, 27348, 18651, 6074, 6074,\n", + " 6074, 6074, 6074, 6074, 12982, 12982, 12982, 6521, 29972,\n", + " 29972, 6511, 7819, 7819, 2609, 14359, 14359, 14359, 3705,\n", + " 9541, 25303, 9541, 25303, 4792, 2979, 17462, 17462, 17462,\n", + " 17462, 17462, 5527, 15841, 15841, 15396, 15396, 21546, 16237,\n", + " 16237, 16237, 16237, 2799, 2799, 2799, 13780, 23869, 23869,\n", + " 24792, 24792, 13022, 13022, 13022, 9865, 7696, 22248, 31606],\n", + " [19979, 20579, 22020, 22020, 576, 19948, 19948, 10791, 5248,\n", + " 5248, 5248, 1185, 1185, 20392, 20392, 20392, 20392, 20392,\n", + " 31140, 31140, 18341, 18891, 18891, 18891, 18891, 18891, 31792,\n", + " 31792, 31792, 31792, 31792, 5547, 5547, 5547, 5060, 5060,\n", + " 5060, 18725, 18725, 18725, 3, 5237, 5237, 27163, 27163,\n", + " 27163, 27163, 27163, 29317, 9957, 9957, 9957, 17969, 17969,\n", + " 17969, 17969, 15075, 1140, 1140, 30802, 30802, 31919, 31919,\n", + " 31919, 31919, 12139, 6761, 2441, 2441, 382, 382, 382],\n", + " [14415, 9829, 1058, 1058, 23490, 23490, 23490, 23490, 23490,\n", + " 23490, 6579, 6579, 13956, 13956, 17003, 24228, 24228, 24228,\n", + " 24228, 11307, 11307, 9794, 9794, 9794, 24817, 24817, 16903,\n", + " 14146, 14146, 14146, 23557, 23557, 23557, 29409, 18324, 16263,\n", + " 7605, 7605, 20814, 20814, 20814, 20814, 20814, 5027, 5027,\n", + " 5027, 5027, 5027, 18909, 893, 8683, 8109, 8109, 8109,\n", + " 27426, 27426, 27426, 27426, 13855, 13855, 13855, 13855, 31722,\n", + " 31722, 31722, 31722, 25075, 25075, 22512, 20703, 20703, 20703],\n", + " [22911, 18466, 18466, 12076, 4050, 4050, 4050, 19813, 19813,\n", + " 19813, 19813, 2865, 2865, 2865, 14321, 14321, 14321, 14321,\n", + " 14321, 14321, 4438, 4438, 4438, 4438, 8582, 8582, 8582,\n", + " 8582, 26578, 11339, 11339, 11339, 11339, 11339, 11339, 11339,\n", + " 11339, 13361, 13361, 26873, 26873, 26873, 22957, 22957, 13748,\n", + " 13748, 13748, 5623, 5623, 5623, 17672, 17672, 17672, 15028,\n", + " 15028, 15028, 8560, 8560, 30564, 29567, 29567, 30564, 10013,\n", + " 10013, 10013, 10013, 10013, 10013, 3026, 3026, 3026, 3026],\n", + " [10577, 29066, 29066, 25428, 13980, 26499, 25428, 12203, 12203,\n", + " 10868, 10868, 27443, 27443, 370, 370, 12663, 12663, 26829,\n", + " 2433, 2433, 2433, 2916, 16306, 16306, 31577, 4770, 18127,\n", + " 18127, 18127, 8767, 21902, 21902, 21902, 3602, 10718, 10718,\n", + " 10718, 10718, 6790, 10718, 17278, 25493, 14993, 14993, 14993,\n", + " 14993, 3650, 18070, 18070, 1424, 1424, 3362, 3362, 3362,\n", + " 19308, 29145, 27664, 14634, 14634, 18000, 25387, 29952, 29952,\n", + " 29952, 29952, 23214, 23214, 1483, 20303, 20303, 28586, 28586],\n", + " [29893, 29893, 10716, 24337, 24337, 24259, 24259, 24259, 29739,\n", + " 29739, 6923, 6923, 2749, 2749, 2749, 8470, 8470, 3967,\n", + " 3967, 3967, 3967, 3967, 3967, 19957, 19957, 19957, 14301,\n", + " 14301, 14562, 14562, 9031, 18729, 18729, 18729, 18729, 18729,\n", + " 18729, 22677, 22677, 4087, 4087, 4087, 30576, 4323, 4323,\n", + " 4323, 4323, 3100, 3100, 3100, 3100, 13917, 13917, 15060,\n", + " 15060, 15060, 11862, 11862, 25957, 3450, 3450, 3450, 3450,\n", + " 21986, 21986, 21986, 29361, 24712, 26797, 28380, 28380, 28380],\n", + " [30362, 24507, 3881, 3881, 645, 645, 645, 645, 645,\n", + " 645, 5768, 5768, 5768, 5768, 29522, 25843, 25843, 25843,\n", + " 25843, 26413, 26413, 26413, 26413, 26413, 26413, 19608, 19608,\n", + " 8723, 8723, 8723, 8723, 8723, 19173, 19173, 19173, 26717,\n", + " 26717, 30606, 30606, 1418, 29528, 12212, 12212, 12212, 10802,\n", + " 2835, 2835, 2835, 20336, 9931, 28285, 28285, 28285, 22681,\n", + " 22681, 16125, 24028, 24028, 24028, 24028, 27922, 27922, 2786,\n", + " 2786, 10073, 10073, 10073, 10073, 20427, 20427, 20427, 20427]],\n", + " dtype=int32), 10.371124, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [15:43<00:00, 1.66it/s, accuracy=0.258, cost=4.66]\n", + "minibatch loop: 100%|██████████| 40/40 [00:13<00:00, 2.94it/s, accuracy=0.328, cost=4.03]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.051461186" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.5" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/6.gru-seq2seq-greedy.ipynb b/neural-machine-translation/6.gru-seq2seq-greedy.ipynb deleted file mode 100644 index ce0a5d2..0000000 --- a/neural-machine-translation/6.gru-seq2seq-greedy.ipynb +++ /dev/null @@ -1,392 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype=tf.int32)\n", - " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embedding = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embedding = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " \n", - " _, encoder_state = tf.nn.dynamic_rnn(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)]), \n", - " inputs = tf.nn.embedding_lookup(encoder_embedding, self.X),\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32)\n", - " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " dense = tf.layers.Dense(to_dict_size)\n", - " decoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = tf.nn.embedding_lookup(decoder_embedding, decoder_input),\n", - " sequence_length = self.Y_seq_len,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = training_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", - " self.training_logits = training_decoder_output.rnn_output\n", - " \n", - " predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(\n", - " embedding = decoder_embedding,\n", - " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", - " end_token = EOS)\n", - " predicting_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cells,\n", - " helper = predicting_helper,\n", - " initial_state = encoder_state,\n", - " output_layer = dense)\n", - " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = predicting_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = 2 * tf.reduce_max(self.X_seq_len))\n", - " self.predicting_ids = predicting_decoder_output.sample_id\n", - " \n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.training_logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 6.595950, avg accuracy: 0.050108\n", - "epoch: 2, avg loss: 6.199383, avg accuracy: 0.060167\n", - "epoch: 3, avg loss: 6.084990, avg accuracy: 0.083561\n", - "epoch: 4, avg loss: 5.962430, avg accuracy: 0.097788\n", - "epoch: 5, avg loss: 5.888171, avg accuracy: 0.108525\n", - "epoch: 6, avg loss: 5.832853, avg accuracy: 0.114980\n", - "epoch: 7, avg loss: 5.758645, avg accuracy: 0.119347\n", - "epoch: 8, avg loss: 5.639997, avg accuracy: 0.127720\n", - "epoch: 9, avg loss: 5.506897, avg accuracy: 0.134039\n", - "epoch: 10, avg loss: 5.339745, avg accuracy: 0.146608\n", - "epoch: 11, avg loss: 5.130958, avg accuracy: 0.158601\n", - "epoch: 12, avg loss: 4.904993, avg accuracy: 0.176436\n", - "epoch: 13, avg loss: 4.674516, avg accuracy: 0.194624\n", - "epoch: 14, avg loss: 4.484314, avg accuracy: 0.215770\n", - "epoch: 15, avg loss: 4.271039, avg accuracy: 0.236365\n", - "epoch: 16, avg loss: 4.063129, avg accuracy: 0.258189\n", - "epoch: 17, avg loss: 3.863059, avg accuracy: 0.291242\n", - "epoch: 18, avg loss: 3.630035, avg accuracy: 0.326530\n", - "epoch: 19, avg loss: 3.402997, avg accuracy: 0.364216\n", - "epoch: 20, avg loss: 3.211018, avg accuracy: 0.408099\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k+batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index ], PAD)\n", - " predicted, accuracy,loss, _ = sess.run([model.predicting_ids, \n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: or , if you had to choose between the last two , which one would you choose ?\n", - "REAL ANSWER: sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ bạn có đau không ? đau như thế nào ?\n", - "PREDICTED ANSWER: sau khi thôi không làm đau mọi người nữa , bạn sẽ hỏi bạn sẽ hỏi bạn không ? \n", - "\n", - "row 2\n", - "QUESTION: i kept on doing this for a while .\n", - "REAL ANSWER: hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ?\n", - "PREDICTED ANSWER: hoặc nếu tôi chọn giữa 2 kiểu cuối , bạn sẽ chọn cái nào ? \n", - "\n", - "row 3\n", - "QUESTION: and then , like all good academic projects , i got more funding .\n", - "REAL ANSWER: tôi tiếp tục làm thí nghiệm này 1 thời gian\n", - "PREDICTED ANSWER: và tôi tiếp tục làm thí nghiệm này 1 thời gian \n", - "\n", - "row 4\n", - "QUESTION: i moved to sounds , electrical shocks -- i even had a pain suit that i could get people to feel much more pain .\n", - "REAL ANSWER: và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ .\n", - "PREDICTED ANSWER: và sau đó , tôi có thể lúc đó , tôi nhận thêm nguồn tài trợ . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/7.basic-birnn-seq2seq-manual.ipynb b/neural-machine-translation/7.basic-birnn-seq2seq-manual.ipynb deleted file mode 100644 index e23de84..0000000 --- a/neural-machine-translation/7.basic-birnn-seq2seq-manual.ipynb +++ /dev/null @@ -1,412 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.BasicRNNCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layer // 2),\n", - " cell_bw = cells(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " bi_state = tf.concat((state_fw,state_bw), -1)\n", - " last_state = tuple([bi_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - "\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", - " " - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From :6: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.711091, avg accuracy: 0.881564\n", - "epoch: 2, avg loss: 0.864222, avg accuracy: 0.911636\n", - "epoch: 3, avg loss: 0.783447, avg accuracy: 0.913009\n", - "epoch: 4, avg loss: 0.752106, avg accuracy: 0.911964\n", - "epoch: 5, avg loss: 0.729447, avg accuracy: 0.913945\n", - "epoch: 6, avg loss: 0.724528, avg accuracy: 0.915882\n", - "epoch: 7, avg loss: 0.749808, avg accuracy: 0.913755\n", - "epoch: 8, avg loss: 0.721614, avg accuracy: 0.914509\n", - "epoch: 9, avg loss: 0.725813, avg accuracy: 0.912709\n", - "epoch: 10, avg loss: 0.731039, avg accuracy: 0.912827\n", - "epoch: 11, avg loss: 0.697485, avg accuracy: 0.915909\n", - "epoch: 12, avg loss: 0.710194, avg accuracy: 0.914564\n", - "epoch: 13, avg loss: 0.694111, avg accuracy: 0.915436\n", - "epoch: 14, avg loss: 0.690663, avg accuracy: 0.915491\n", - "epoch: 15, avg loss: 0.678798, avg accuracy: 0.916791\n", - "epoch: 16, avg loss: 0.666149, avg accuracy: 0.918091\n", - "epoch: 17, avg loss: 0.660566, avg accuracy: 0.918100\n", - "epoch: 18, avg loss: 0.653175, avg accuracy: 0.918264\n", - "epoch: 19, avg loss: 0.638035, avg accuracy: 0.919855\n", - "epoch: 20, avg loss: 0.631894, avg accuracy: 0.919491\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: you can send weather balloons up into the stratosphere , collect microbes , see what 's up there .\n", - "REAL ANSWER: bạn có thể thả khí cầu thời tiết lên tầng tĩnh khí , thu thập vi khuẩn , xem điều gì đang xảy ra trên đó .\n", - "PREDICTED ANSWER: bạn có thể có tôi , , , " , , , , , , , , , , . . \n", - "\n", - "row 2\n", - "QUESTION: you can make a biocensor out of yeast to detect pollutants in water .\n", - "REAL ANSWER: bạn có thể làm ra một dụng cụ kiểm duyệt sinh học từ men để phát hiện chất gây ô nhiễm trong nước .\n", - "PREDICTED ANSWER: bạn có thể có một một một một , , của , , , . . \n", - "\n", - "row 3\n", - "QUESTION: she didn 't expect me to go there .\n", - "REAL ANSWER: chị ấy khồng nghĩ tôi sẽ đi .\n", - "PREDICTED ANSWER: chị đây : , tôi , tôi , . . \n", - "\n", - "row 4\n", - "QUESTION: its artists told stories across national boundaries , in as many languages , genres and philosophies as one can imagine .\n", - "REAL ANSWER: nó là những câu chuyện kể của các nghệ sĩ vượt qua các ranh giới quốc gia , dưới vô vàn ngôn ngữ , thể loại và triết lý mà một người có thể tưởng tượng ra được .\n", - "PREDICTED ANSWER: tôi là một , là , tôi , , , , , , , , , , , , tôi có . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/7.basic-birnn-seq2seq.ipynb b/neural-machine-translation/7.basic-birnn-seq2seq.ipynb new file mode 100644 index 0000000..23aff42 --- /dev/null +++ b/neural-machine-translation/7.basic-birnn-seq2seq.ipynb @@ -0,0 +1,807 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.BasicRNNCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: BasicRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:460: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :43: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 2324, 18549, 2575, 18223, 16621, 25074, 5758, 16105,\n", + " 9643, 14533, 2418, 17963, 9586, 20965, 16069, 31732, 8349,\n", + " 21344, 2124, 22121, 31430, 28780, 6104, 29465, 28991, 19626,\n", + " 31276, 30543, 6748, 28268, 14529, 19673, 29387, 14052, 22645,\n", + " 22591]],\n", + " \n", + " [[ 1, 4750, 23343, 23915, 14818, 27668, 1301, 23257, 8883,\n", + " 5229, 6420, 9555, 16498, 17761, 17527, 24783, 21504, 29624,\n", + " 5422, 10897, 1345, 29281, 12418, 391, 17635, 22516, 31582,\n", + " 12158, 8465, 8234, 3945, 19783, 31592, 18671, 23575, 25175,\n", + " 30270]],\n", + " \n", + " [[ 1, 11165, 28230, 28543, 3588, 8446, 30983, 5646, 18574,\n", + " 11500, 15001, 18724, 12355, 27114, 26040, 21531, 20297, 18974,\n", + " 19791, 5791, 18787, 2122, 24283, 23086, 15403, 16007, 4479,\n", + " 2500, 998, 30034, 479, 21757, 30849, 28705, 20484, 10625,\n", + " 10963]],\n", + " \n", + " [[ 1, 21566, 5275, 16424, 18786, 23284, 14270, 19038, 29192,\n", + " 18887, 29299, 21467, 19797, 17395, 29100, 20028, 1229, 16055,\n", + " 22997, 9385, 19514, 4418, 31845, 26042, 9266, 1335, 3478,\n", + " 6712, 21682, 9796, 30716, 356, 19198, 18637, 29966, 10713,\n", + " 24315]],\n", + " \n", + " [[ 1, 16123, 25557, 9728, 21453, 6776, 15441, 10971, 23304,\n", + " 24959, 21721, 10360, 1855, 22637, 23298, 26272, 27013, 11847,\n", + " 21778, 25809, 5284, 22782, 20178, 8433, 25365, 3587, 5408,\n", + " 11522, 13084, 13902, 17884, 15812, 17167, 1198, 20439, 15428,\n", + " 2683]],\n", + " \n", + " [[ 1, 21147, 29925, 26650, 22128, 25346, 8285, 19830, 3179,\n", + " 14751, 27000, 26475, 29160, 31503, 9476, 12, 29077, 31117,\n", + " 27793, 944, 19313, 8688, 18711, 22847, 16255, 27079, 10947,\n", + " 25837, 10200, 15330, 16636, 31742, 15323, 10414, 7529, 12109,\n", + " 15700]],\n", + " \n", + " [[ 1, 7877, 14773, 3217, 17422, 1192, 8104, 23517, 5376,\n", + " 24541, 9258, 17910, 3494, 16991, 25395, 23339, 4406, 9769,\n", + " 21111, 10368, 8803, 28612, 8344, 10965, 1994, 7564, 21910,\n", + " 27063, 1434, 21804, 14596, 9499, 5702, 21906, 30476, 21280,\n", + " 19597]],\n", + " \n", + " [[ 1, 10728, 24759, 8878, 23175, 23340, 1856, 25952, 16777,\n", + " 26837, 24365, 2439, 17991, 3285, 9794, 19027, 4322, 1352,\n", + " 23518, 3633, 16292, 27457, 8510, 26018, 27915, 12412, 27939,\n", + " 15962, 19784, 25197, 25929, 9874, 31138, 29353, 14604, 17760,\n", + " 11035]],\n", + " \n", + " [[ 1, 24274, 548, 31365, 5771, 28082, 14166, 13537, 6208,\n", + " 30799, 1245, 10927, 20928, 1061, 18775, 31173, 28322, 22261,\n", + " 1384, 3940, 20274, 2011, 1977, 3613, 13572, 13625, 6727,\n", + " 5969, 19057, 29188, 31618, 29662, 5689, 10011, 7001, 11295,\n", + " 25172]],\n", + " \n", + " [[ 1, 17786, 21104, 19157, 31089, 21808, 19376, 27435, 22699,\n", + " 16683, 6913, 22721, 25268, 20002, 16191, 14436, 22575, 18775,\n", + " 7389, 28126, 3829, 31173, 3472, 9470, 20906, 22210, 1466,\n", + " 5234, 18870, 26948, 5114, 21135, 31491, 9718, 28972, 19195,\n", + " 29744]]], dtype=int32), 10.369235, 0.0]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [07:55<00:00, 3.29it/s, accuracy=0.0875, cost=7.25]\n", + "minibatch loop: 100%|██████████| 40/40 [00:05<00:00, 7.29it/s, accuracy=0.0806, cost=6.93]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6.319555e-05" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/8.lstm-birnn-seq2seq-manual.ipynb b/neural-machine-translation/8.lstm-birnn-seq2seq-manual.ipynb deleted file mode 100644 index f5a8b22..0000000 --- a/neural-machine-translation/8.lstm-birnn-seq2seq-manual.ipynb +++ /dev/null @@ -1,410 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layer // 2),\n", - " cell_bw = cells(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", - " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", - " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", - " last_state = tuple([bi_lstm_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - "\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 3.057538, avg accuracy: 0.882773\n", - "epoch: 2, avg loss: 0.838540, avg accuracy: 0.910864\n", - "epoch: 3, avg loss: 0.763301, avg accuracy: 0.911609\n", - "epoch: 4, avg loss: 0.735146, avg accuracy: 0.912718\n", - "epoch: 5, avg loss: 0.724207, avg accuracy: 0.913800\n", - "epoch: 6, avg loss: 0.722383, avg accuracy: 0.913736\n", - "epoch: 7, avg loss: 0.714579, avg accuracy: 0.913891\n", - "epoch: 8, avg loss: 0.698534, avg accuracy: 0.915482\n", - "epoch: 9, avg loss: 0.708854, avg accuracy: 0.913973\n", - "epoch: 10, avg loss: 0.707431, avg accuracy: 0.912118\n", - "epoch: 11, avg loss: 0.683583, avg accuracy: 0.915700\n", - "epoch: 12, avg loss: 0.672532, avg accuracy: 0.916382\n", - "epoch: 13, avg loss: 0.675660, avg accuracy: 0.916018\n", - "epoch: 14, avg loss: 0.659123, avg accuracy: 0.917927\n", - "epoch: 15, avg loss: 0.657314, avg accuracy: 0.917800\n", - "epoch: 16, avg loss: 0.649246, avg accuracy: 0.918791\n", - "epoch: 17, avg loss: 0.654536, avg accuracy: 0.918145\n", - "epoch: 18, avg loss: 0.647920, avg accuracy: 0.918618\n", - "epoch: 19, avg loss: 0.642469, avg accuracy: 0.919536\n", - "epoch: 20, avg loss: 0.648654, avg accuracy: 0.918473\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: a biohacker in germany , a journalist , wanted to know whose dog was leaving little presents on his street ?\n", - "REAL ANSWER: một nhà biohacker người đức , một nhà báo , muốn biết chó của ai đã để lại những " món quà " nho nhỏ trên đường ?\n", - "PREDICTED ANSWER: và tôi , , , , , , , , , , , , , , , , , , , , , , , . . \n", - "\n", - "row 2\n", - "QUESTION: so " rudolph the red-nosed reindeer " -- you know it ?\n", - "REAL ANSWER: bài " con tuần lộc mũi đỏ rudolph " -- bạn biết bài đó chứ ?\n", - "PREDICTED ANSWER: và tôi tôi , , , , , , , , , , , . . \n", - "\n", - "row 3\n", - "QUESTION: and it was with great delight that we found young people up and down the country explaining with authority what filibustering was and why the lords might defy their bedtime on a point of principle .\n", - "REAL ANSWER: và thật vui vô cùng khi chúng tôi thấy những bạn trẻ trên khắp đất nước giải thích với nhà cầm quyền rằng cản trở các đạo luật là gì và tại sao các nhà cầm quyền có thể định giờ ngủ của họ theo một nguyên tắc nào đó .\n", - "PREDICTED ANSWER: và và , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , \n", - "\n", - "row 4\n", - "QUESTION: she didn 't expect me to go there .\n", - "REAL ANSWER: chị ấy khồng nghĩ tôi sẽ đi .\n", - "PREDICTED ANSWER: tôi tôi tôi tôi tôi . . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/8.lstm-birnn-seq2seq.ipynb b/neural-machine-translation/8.lstm-birnn-seq2seq.ipynb new file mode 100644 index 0000000..13572d1 --- /dev/null +++ b/neural-machine-translation/8.lstm-birnn-seq2seq.ipynb @@ -0,0 +1,785 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.LSTMCell(size_layer,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state_c = tf.concat((state_fw.c, state_bw.c), -1)\n", + " bi_state_h = tf.concat((state_fw.h, state_bw.h), -1)\n", + " bi_lstm_state = tf.nn.rnn_cell.LSTMStateTuple(c=bi_state_c, h=bi_state_h)\n", + " last_state = tuple([bi_lstm_state] * num_layers)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :50: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 15601, 5615, 29690, 29690, 29690, 16154, 22547, 22547,\n", + " 24217, 3729, 24217, 24217, 3493, 3493, 3493, 3493, 19484,\n", + " 16880, 28603, 28603, 22409, 22409, 19347, 1932, 14111, 28843,\n", + " 28843, 19816, 19816, 19816, 26916, 4480, 4480, 4480, 4480,\n", + " 28082]],\n", + " \n", + " [[ 1, 29471, 29471, 2294, 2294, 2294, 22212, 14558, 14558,\n", + " 26710, 24598, 24598, 15681, 15681, 15681, 15681, 15905, 15905,\n", + " 15905, 15905, 15905, 8586, 8586, 6491, 6491, 6491, 18662,\n", + " 18662, 23371, 25562, 25562, 17801, 17801, 16091, 16091, 16091,\n", + " 20324]],\n", + " \n", + " [[ 1, 26174, 28821, 28821, 28821, 12738, 12738, 12738, 12738,\n", + " 5170, 5170, 28142, 4305, 31621, 31621, 31621, 7803, 29797,\n", + " 29797, 29797, 2192, 2947, 2947, 2947, 2947, 12268, 19555,\n", + " 19555, 14105, 14105, 19555, 30502, 14911, 14911, 27759, 4714,\n", + " 803]],\n", + " \n", + " [[ 1, 7008, 15227, 24222, 24222, 11821, 11821, 31941, 31941,\n", + " 31941, 18974, 18974, 23187, 2331, 2331, 10828, 5962, 5962,\n", + " 5962, 15774, 5048, 5048, 9939, 9939, 9939, 9939, 9939,\n", + " 9939, 3574, 423, 423, 423, 30410, 1630, 1630, 22594,\n", + " 1163]],\n", + " \n", + " [[ 1, 1262, 1262, 15376, 15376, 9905, 14789, 23219, 23219,\n", + " 23219, 3304, 3304, 3304, 3304, 4087, 19465, 19465, 17734,\n", + " 17734, 15653, 15653, 15653, 15653, 15653, 15653, 11115, 11115,\n", + " 2418, 2418, 17624, 17624, 1671, 1671, 1671, 17624, 31492,\n", + " 21315]],\n", + " \n", + " [[ 1, 16803, 3316, 3316, 3316, 22010, 22010, 3196, 3196,\n", + " 3196, 1426, 1426, 15639, 29029, 29029, 12790, 12790, 15817,\n", + " 6970, 6970, 6970, 20888, 20888, 21140, 21140, 23507, 23507,\n", + " 13935, 13935, 13935, 28924, 28924, 28924, 31229, 31229, 31229,\n", + " 31282]],\n", + " \n", + " [[ 1, 21582, 21078, 24819, 19118, 1031, 1031, 2523, 566,\n", + " 566, 4404, 29783, 7583, 7583, 3648, 7583, 24250, 23771,\n", + " 23771, 24250, 2455, 22139, 25647, 25647, 22139, 25544, 25544,\n", + " 3589, 26237, 26237, 26237, 11151, 11151, 29638, 17067, 15288,\n", + " 15288]],\n", + " \n", + " [[ 1, 22831, 22831, 2901, 2901, 2901, 2901, 8799, 5962,\n", + " 5962, 5962, 23948, 18262, 6184, 16047, 6184, 26399, 26399,\n", + " 23137, 15191, 15191, 14048, 29454, 29454, 29454, 29454, 29454,\n", + " 11971, 25859, 25859, 24187, 24336, 28015, 19262, 18761, 21083,\n", + " 18098]],\n", + " \n", + " [[ 1, 7484, 7484, 22580, 22580, 15241, 15241, 8799, 11997,\n", + " 11997, 25166, 12196, 12196, 12196, 12196, 4730, 7142, 7142,\n", + " 7142, 7142, 15184, 24169, 24169, 24169, 12216, 12216, 12216,\n", + " 12216, 12216, 12216, 12216, 12216, 2199, 7839, 7839, 27895,\n", + " 24658]],\n", + " \n", + " [[ 1, 2176, 29221, 29221, 31132, 1739, 1739, 13537, 13537,\n", + " 3169, 3169, 3169, 7550, 6867, 6867, 6867, 21275, 22126,\n", + " 4675, 22126, 6622, 6622, 6622, 6622, 19638, 30402, 30402,\n", + " 9715, 9715, 29162, 6876, 27841, 27841, 27841, 18213, 18213,\n", + " 23982]]], dtype=int32), 10.373104, 0.0]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:01<00:00, 2.60it/s, accuracy=0.109, cost=6.91] \n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 5.85it/s, accuracy=0.113, cost=6.67]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.012854616" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/9.gru-birnn-seq2seq-manual.ipynb b/neural-machine-translation/9.gru-birnn-seq2seq-manual.ipynb deleted file mode 100644 index 0bda4d1..0000000 --- a/neural-machine-translation/9.gru-birnn-seq2seq-manual.ipynb +++ /dev/null @@ -1,401 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from sklearn.utils import shuffle\n", - "import re\n", - "import time\n", - "import collections\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words, atleast=1):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " counter = collections.Counter(words).most_common(n_words)\n", - " counter = [i for i in counter if i[1] >= atleast]\n", - " count.extend(counter)\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "len from: 500, len to: 500\n" - ] - } - ], - "source": [ - "with open('english-train', 'r') as fopen:\n", - " text_from = fopen.read().lower().split('\\n')[:-1]\n", - "with open('vietnam-train', 'r') as fopen:\n", - " text_to = fopen.read().lower().split('\\n')[:-1]\n", - "print('len from: %d, len to: %d'%(len(text_from), len(text_to)))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 1935\n", - "Most common words [(',', 564), ('.', 477), ('the', 368), ('and', 286), ('to', 242), ('of', 220)]\n", - "Sample data [482, 483, 78, 6, 137, 484, 10, 226, 787, 14] ['rachel', 'pike', ':', 'the', 'science', 'behind', 'a', 'climate', 'headline', 'in']\n" - ] - } - ], - "source": [ - "concat_from = ' '.join(text_from).split()\n", - "vocabulary_size_from = len(list(set(concat_from)))\n", - "data_from, count_from, dictionary_from, rev_dictionary_from = build_dataset(concat_from, vocabulary_size_from)\n", - "print('vocab from size: %d'%(vocabulary_size_from))\n", - "print('Most common words', count_from[4:10])\n", - "print('Sample data', data_from[:10], [rev_dictionary_from[i] for i in data_from[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab to size: 1461\n", - "Most common words [(',', 472), ('.', 430), ('tôi', 283), ('và', 230), ('có', 199), ('chúng', 196)]\n", - "Sample data [84, 22, 668, 73, 10, 389, 110, 34, 81, 299] ['khoa', 'học', 'đằng', 'sau', 'một', 'tiêu', 'đề', 'về', 'khí', 'hậu']\n" - ] - } - ], - "source": [ - "concat_to = ' '.join(text_to).split()\n", - "vocabulary_size_to = len(list(set(concat_to)))\n", - "data_to, count_to, dictionary_to, rev_dictionary_to = build_dataset(concat_to, vocabulary_size_to)\n", - "print('vocab to size: %d'%(vocabulary_size_to))\n", - "print('Most common words', count_to[4:10])\n", - "print('Sample data', data_to[:10], [rev_dictionary_to[i] for i in data_to[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "GO = dictionary_from['GO']\n", - "PAD = dictionary_from['PAD']\n", - "EOS = dictionary_from['EOS']\n", - "UNK = dictionary_from['UNK']" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(len(text_to)):\n", - " text_to[i] += ' EOS'" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "class Chatbot:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " from_dict_size, to_dict_size, learning_rate, batch_size):\n", - " \n", - " def cells(size,reuse=False):\n", - " return tf.nn.rnn_cell.GRUCell(size,reuse=reuse)\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None, None])\n", - " self.X_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " encoder_embeddings = tf.Variable(tf.random_uniform([from_dict_size, embedded_size], -1, 1))\n", - " decoder_embeddings = tf.Variable(tf.random_uniform([to_dict_size, embedded_size], -1, 1))\n", - " encoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", - " decoder_embedded = tf.nn.embedding_lookup(encoder_embeddings, decoder_input)\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layer // 2),\n", - " cell_bw = cells(size_layer // 2),\n", - " inputs = encoder_embedded,\n", - " sequence_length = self.X_seq_len,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d'%(n))\n", - " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " bi_state = tf.concat((state_fw,state_bw), -1)\n", - " last_state = tuple([bi_state] * num_layers)\n", - " \n", - " with tf.variable_scope(\"decoder\"):\n", - " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", - " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", - " initial_state = last_state,\n", - " dtype = tf.float32)\n", - " self.logits = tf.layers.dense(outputs,to_dict_size)\n", - "\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.logits,\n", - " targets = self.Y,\n", - " weights = masks)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " y_t = tf.argmax(self.logits,axis=2)\n", - " y_t = tf.cast(y_t, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.Y, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 256\n", - "num_layers = 2\n", - "embedded_size = 128\n", - "learning_rate = 0.001\n", - "batch_size = 16\n", - "epoch = 20" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Chatbot(size_layer, num_layers, embedded_size, len(dictionary_from), \n", - " len(dictionary_to), learning_rate,batch_size)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "def str_idx(corpus, dic):\n", - " X = []\n", - " for i in corpus:\n", - " ints = []\n", - " for k in i.split():\n", - " ints.append(dic.get(k,UNK))\n", - " X.append(ints)\n", - " return X" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "X = str_idx(text_from, dictionary_from)\n", - "Y = str_idx(text_to, dictionary_to)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(174, 220)" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "maxlen_question = max([len(x) for x in X]) * 2\n", - "maxlen_answer = max([len(y) for y in Y]) * 2\n", - "maxlen_question, maxlen_answer" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int, maxlen):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = maxlen\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(maxlen)\n", - " return padded_seqs, seq_lens" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 1, avg loss: 1.600767, avg accuracy: 0.882627\n", - "epoch: 2, avg loss: 0.739814, avg accuracy: 0.913555\n", - "epoch: 3, avg loss: 0.725232, avg accuracy: 0.913864\n", - "epoch: 4, avg loss: 0.716623, avg accuracy: 0.913936\n", - "epoch: 5, avg loss: 0.709351, avg accuracy: 0.914400\n", - "epoch: 6, avg loss: 0.689928, avg accuracy: 0.916364\n", - "epoch: 7, avg loss: 0.680712, avg accuracy: 0.916782\n", - "epoch: 8, avg loss: 0.670387, avg accuracy: 0.916964\n", - "epoch: 9, avg loss: 0.671381, avg accuracy: 0.916055\n", - "epoch: 10, avg loss: 0.668878, avg accuracy: 0.915600\n", - "epoch: 11, avg loss: 0.649184, avg accuracy: 0.917282\n", - "epoch: 12, avg loss: 0.638758, avg accuracy: 0.917464\n", - "epoch: 13, avg loss: 0.614561, avg accuracy: 0.920173\n", - "epoch: 14, avg loss: 0.607420, avg accuracy: 0.919564\n", - "epoch: 15, avg loss: 0.604646, avg accuracy: 0.918409\n", - "epoch: 16, avg loss: 0.594362, avg accuracy: 0.918727\n", - "epoch: 17, avg loss: 0.564419, avg accuracy: 0.922173\n", - "epoch: 18, avg loss: 0.557343, avg accuracy: 0.921609\n", - "epoch: 19, avg loss: 0.551805, avg accuracy: 0.921545\n", - "epoch: 20, avg loss: 0.538976, avg accuracy: 0.922818\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_loss, total_accuracy = 0, 0\n", - " X, Y = shuffle(X, Y)\n", - " for k in range(0, len(text_to), batch_size):\n", - " index = min(k + batch_size, len(text_to))\n", - " batch_x, seq_x = pad_sentence_batch(X[k: index], PAD, maxlen_answer)\n", - " batch_y, seq_y = pad_sentence_batch(Y[k: index], PAD, maxlen_answer)\n", - " predicted, accuracy, loss, _ = sess.run([tf.argmax(model.logits,2),\n", - " model.accuracy, model.cost, model.optimizer], \n", - " feed_dict={model.X:batch_x,\n", - " model.Y:batch_y,\n", - " model.X_seq_len:seq_x,\n", - " model.Y_seq_len:seq_y})\n", - " total_loss += loss\n", - " total_accuracy += accuracy\n", - " total_loss /= (len(text_to) / batch_size)\n", - " total_accuracy /= (len(text_to) / batch_size)\n", - " print('epoch: %d, avg loss: %f, avg accuracy: %f'%(i+1, total_loss, total_accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "row 1\n", - "QUESTION: this is such a new area , and as we say back in brooklyn , you ain 't seen nothin ' yet .\n", - "REAL ANSWER: đây là một lĩnh vực rất mới , và như chúng tôi nói ở brooklyn , bạn còn chưa thấy gì cả đâu . .\n", - "PREDICTED ANSWER: đây là là là là , , , , , , , , , , , , , , , . . . . \n", - "\n", - "row 2\n", - "QUESTION: so put your arms back up and flex your bicep .\n", - "REAL ANSWER: vâng , hãy giơ tay lên và cong cơ cánh tay lại .\n", - "PREDICTED ANSWER: đây đây là là là , , , , , , . . \n", - "\n", - "row 3\n", - "QUESTION: we stopped looking at him as a problem , and we started to look at him as an opportunity to improve .\n", - "REAL ANSWER: chúng tôi không còn coi bé là một vấn đề nữa , và chúng tôi bắt đầu coi bé như một cơ hội để trở nên tốt hơn .\n", - "PREDICTED ANSWER: chúng chúng tôi chúng chúng chúng chúng chúng chúng chúng chúng chúng chúng chúng chúng và và và và chúng chúng chúng chúng tôi . . . \n", - "\n", - "row 4\n", - "QUESTION: we reverse engineer lab equipment .\n", - "REAL ANSWER: chúng tôi tự chế dụng cụ phòng thí nghiệm .\n", - "PREDICTED ANSWER: chúng tôi tôi chúng bay để để . . . \n", - "\n" - ] - } - ], - "source": [ - "for i in range(len(batch_x)):\n", - " print('row %d'%(i+1))\n", - " print('QUESTION:',' '.join([rev_dictionary_from[n] for n in batch_x[i] if n not in [0,1,2,3]]))\n", - " print('REAL ANSWER:',' '.join([rev_dictionary_to[n] for n in batch_y[i] if n not in[0,1,2,3]]))\n", - " print('PREDICTED ANSWER:',' '.join([rev_dictionary_to[n] for n in predicted[i] if n not in[0,1,2,3]]),'\\n')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/neural-machine-translation/9.gru-birnn-seq2seq.ipynb b/neural-machine-translation/9.gru-birnn-seq2seq.ipynb new file mode 100644 index 0000000..31ec943 --- /dev/null +++ b/neural-machine-translation/9.gru-birnn-seq2seq.ipynb @@ -0,0 +1,817 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('dataset-bpe.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "EOS = 2\n", + "GO = 1\n", + "vocab_size = 32000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "train_Y = [i + [2] for i in train_Y]\n", + "test_Y = [i + [2] for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from tensor2tensor.utils import beam_search\n", + "\n", + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[[0.0]]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1], tf.shape(x)[2]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Translator:\n", + " def __init__(self, size_layer, num_layers, embedded_size, learning_rate):\n", + " \n", + " def cells(size_layer, reuse=False):\n", + " return tf.nn.rnn_cell.GRUCell(size_layer,reuse=reuse)\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " self.X_seq_len = tf.count_nonzero(self.X, 1, dtype = tf.int32)\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype = tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " \n", + " embeddings = tf.Variable(tf.random_uniform([vocab_size, embedded_size], -1, 1))\n", + " \n", + " def forward(x, y, reuse = False):\n", + " X_seq_len = tf.count_nonzero(x, 1, dtype = tf.int32)\n", + " Y_seq_len = tf.count_nonzero(y, 1, dtype = tf.int32)\n", + " with tf.variable_scope('model',reuse=reuse):\n", + " encoder_embedded = tf.nn.embedding_lookup(embeddings, x)\n", + " decoder_embedded = tf.nn.embedding_lookup(embeddings, y)\n", + " \n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = encoder_embedded,\n", + " sequence_length = X_seq_len,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " encoder_embedded = tf.concat((out_fw, out_bw), 2)\n", + " \n", + " bi_state = tf.concat((state_fw,state_bw), -1)\n", + " last_state = tuple([bi_state] * num_layers)\n", + " \n", + " with tf.variable_scope(\"decoder\",reuse=reuse):\n", + " rnn_cells_dec = tf.nn.rnn_cell.MultiRNNCell([cells(size_layer) for _ in range(num_layers)])\n", + " outputs, _ = tf.nn.dynamic_rnn(rnn_cells_dec, decoder_embedded, \n", + " sequence_length=Y_seq_len,\n", + " initial_state = last_state,\n", + " dtype = tf.float32)\n", + " return tf.layers.dense(outputs,vocab_size)\n", + " \n", + " main = tf.strided_slice(self.X, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " self.training_logits = forward(self.X, decoder_input, reuse = False)\n", + " \n", + " self.training_logits = self.training_logits[:, :tf.reduce_max(self.Y_seq_len)]\n", + " self.training_logits = pad_second_dim(self.training_logits, tf.reduce_max(self.Y_seq_len))\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " initial_ids = tf.fill([batch_size], GO)\n", + " def symbols_to_logits(ids):\n", + " x = tf.contrib.seq2seq.tile_batch(self.X, 1)\n", + " logits = forward(x, ids, reuse = True)\n", + " return logits[:, tf.shape(ids)[1]-1, :]\n", + " \n", + " final_ids, final_probs, _ = beam_search.beam_search(\n", + " symbols_to_logits,\n", + " initial_ids,\n", + " 1,\n", + " tf.reduce_max(self.X_seq_len),\n", + " vocab_size,\n", + " 0.0,\n", + " eos_id = EOS)\n", + " \n", + " self.fast_result = final_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 512\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "epoch = 20" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "WARNING:tensorflow:From :11: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:559: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:565: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:244: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :43: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensor2tensor/utils/beam_search.py:745: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Translator(size_layer, num_layers, embedded_size, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([[[ 1, 13938, 8345, 8345, 14186, 26764, 25757, 15817, 29790,\n", + " 29269, 730, 22327, 22327, 22327, 4484, 5805, 5805, 5805,\n", + " 781, 781, 6936, 14832, 14832, 24701, 24701, 24701, 5844,\n", + " 20126, 20126, 20126, 20126, 20126, 22872, 22872, 20343, 20343,\n", + " 20343]],\n", + " \n", + " [[ 1, 4154, 25042, 10774, 6937, 6937, 6937, 7613, 10462,\n", + " 10462, 29060, 26384, 26384, 26384, 26384, 26384, 29351, 29351,\n", + " 54, 54, 31323, 554, 7297, 7297, 27100, 27100, 27100,\n", + " 600, 600, 23285, 23285, 23285, 6134, 6134, 31160, 31160,\n", + " 31160]],\n", + " \n", + " [[ 1, 21744, 21744, 21744, 4760, 4760, 4760, 25229, 30517,\n", + " 30517, 25229, 20311, 7714, 7714, 7714, 8515, 8515, 29331,\n", + " 29331, 26839, 26839, 26839, 26839, 26839, 11992, 11992, 937,\n", + " 937, 937, 12250, 12250, 12250, 12250, 26158, 25340, 25340,\n", + " 8487]],\n", + " \n", + " [[ 1, 9927, 9927, 19749, 10989, 10989, 10989, 20886, 20886,\n", + " 20886, 20886, 20886, 24308, 2567, 5809, 5809, 4463, 4463,\n", + " 4463, 17614, 4463, 3676, 5078, 5078, 5078, 15235, 15235,\n", + " 8239, 8239, 12600, 12600, 12600, 12600, 22153, 22153, 22153,\n", + " 22153]],\n", + " \n", + " [[ 1, 25542, 1377, 1377, 20631, 21967, 21967, 11844, 16455,\n", + " 16455, 16274, 16274, 13782, 13782, 9593, 9593, 9593, 9593,\n", + " 20914, 5082, 14145, 14145, 257, 30851, 30851, 30851, 30851,\n", + " 13910, 29007, 29007, 6538, 6538, 6538, 6538, 9754, 9754,\n", + " 19715]],\n", + " \n", + " [[ 1, 4719, 17239, 17239, 28455, 29520, 23543, 23543, 29841,\n", + " 29841, 16992, 10554, 10554, 4688, 17211, 20625, 20625, 20625,\n", + " 20625, 11645, 11645, 3506, 30731, 30731, 30731, 11570, 13081,\n", + " 23588, 23588, 8320, 23588, 25592, 25592, 31456, 16823, 16823,\n", + " 8027]],\n", + " \n", + " [[ 1, 12687, 12687, 10287, 1688, 27283, 27283, 4668, 4668,\n", + " 14642, 25985, 11446, 11446, 11446, 11446, 1725, 29573, 29573,\n", + " 29573, 29573, 29573, 23551, 21031, 23551, 23551, 23551, 12781,\n", + " 12781, 25342, 29499, 29499, 14269, 14269, 14269, 20349, 20349,\n", + " 20349]],\n", + " \n", + " [[ 1, 24167, 27050, 20829, 20829, 17044, 17044, 5993, 28321,\n", + " 28321, 28321, 28321, 12024, 12024, 12024, 30162, 30162, 30162,\n", + " 30162, 22796, 22796, 22796, 13782, 13782, 13782, 13782, 1226,\n", + " 28739, 28739, 28739, 28739, 29374, 28242, 28242, 17099, 21210,\n", + " 25472]],\n", + " \n", + " [[ 1, 388, 12831, 12831, 12831, 26975, 26975, 2840, 20571,\n", + " 2134, 2134, 3335, 10751, 10751, 10751, 24942, 24942, 10919,\n", + " 10919, 10919, 10021, 10021, 10021, 10021, 10021, 3329, 31909,\n", + " 11306, 7614, 7614, 14543, 14543, 14543, 14543, 9044, 9044,\n", + " 9561]],\n", + " \n", + " [[ 1, 7345, 7345, 7345, 17415, 24456, 24456, 24456, 3291,\n", + " 2454, 2454, 3127, 23568, 23568, 6213, 6213, 29033, 23173,\n", + " 23173, 15292, 15292, 15292, 24058, 18943, 19299, 8014, 15661,\n", + " 9308, 9308, 9308, 27761, 27761, 16540, 30025, 30025, 7150,\n", + " 7150]]], dtype=int32), 10.373073, 0.0]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = pad_sequences(train_X[:10], padding='post')\n", + "batch_y = pad_sequences(train_Y[:10], padding='post')\n", + "\n", + "sess.run([model.fast_result, model.cost, model.accuracy], \n", + " feed_dict = {model.X: batch_x, model.Y: batch_y})" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 1563/1563 [10:14<00:00, 2.54it/s, accuracy=0.118, cost=6.63]\n", + "minibatch loop: 100%|██████████| 40/40 [00:06<00:00, 5.94it/s, accuracy=0.145, cost=6.39]\n", + "minibatch loop: 0%| | 0/1563 [00:00 3])\n", + " results.extend(result)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "rights = []\n", + "for r in test_Y:\n", + " rights.append([i for i in r if i > 3])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0095551545" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bleu_hook.compute_bleu(reference_corpus = rights,\n", + " translation_corpus = results)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/README.md b/neural-machine-translation/README.md index 885c227..ef5233e 100644 --- a/neural-machine-translation/README.md +++ b/neural-machine-translation/README.md @@ -1,63 +1,83 @@ -## How-to +## how-to -1. Run any notebook using Jupyter Notebook. +1. run [prepare-dataset.ipynb](prepare-dataset.ipynb). +2. run [prepare-bpe.ipynb](prepare-bpe.ipynb). +3. run [prepare-t2t.ipynb](prepare-t2t.ipynb). + +## Notes + +1. First 200k Trainset to train, validation and test set to test. +2. Based on 20 epochs. +3. Accuracy based on BLEU. +4. RNN and Transformer parameters are not consistent. + +For RNN, + +```python +size_layer = 512 +num_layers = 2 +``` + +For Transformer, we use BASE parameter from Tensor2Tensor. + +Here we never tested what happened to RNN based models if we increase number of layers and size of layers same as Transformer BASE parameter. + +5. Batch size not consistent, most of the models used 128 batch size. ## Accuracy, not sorted -Based on 20 epochs accuracy. The results will be different on different dataset. Trained on a GTX 960, 4GB VRAM. - -| name | accuracy | -|------------------------------------------------------------|----------| -| 1.basic-seq2seq-manual | 0.915255 | -| 2.lstm-seq2seq-manual | 0.917009 | -| 3.gru-seq2seq-manual | 0.920200 | -| 4.basic-seq2seq-api-greedy | 0.960998 | -| 5.lstm-seq2seq-api-greedy | 0.202590 | -| 6.gru-seq2seq-greedy | 0.408099 | -| 7.basic-birnn-seq2seq-manual | 0.919491 | -| 8.lstm-birnn-seq2seq-manual | 0.918473 | -| 9.gru-birnn-seq2seq-manual | 0.922818 | -| 10.basic-birnn-seq2seq-greedy | 0.957355 | -| 11.lstm-birnn-seq2seq-greedy | 0.202628 | -| 12.gru-birnn-seq2seq-greedy | 0.484461 | -| 13.basic-seq2seq-luong | 0.916100 | -| 14.lstm-seq2seq-luong | 0.917736 | -| 15.gru-seq2seq-luong | 0.919482 | -| 16.basic-seq2seq-bahdanau | 0.915700 | -| 17.lstm-seq2seq-bahdanau | 0.721833 | -| 18.gru-seq2seq-bahdanau | 0.919218 | -| 19.lstm-birnn-seq2seq-luong | 0.918555 | -| 20.gru-birnn-seq2seq-luong | 0.919445 | -| 21.lstm-birnn-seq2seq-bahdanau | 0.917655 | -| 22.gru-birnn-seq2seq-bahdanau | 0.920555 | -| 23.lstm-birnn-seq2seq-bahdanau-luong | 0.918182 | -| 24.gru-birnn-seq2seq-bahdanau-luong | 0.920045 | -| 25.lstm-seq2seq-greedy-luong | 0.364322 | -| 26.gru-seq2seq-greedy-luong | 0.627814 | -| 27.lstm-seq2seq-greedy-bahdanau | 0.378199 | -| 28.gru-seq2seq-greedy-bahdanau | 0.470696 | -| 29.lstm-seq2seq-beam | 0.122135 | -| 30.gru-seq2seq-beam | 0.163046 | -| 31.lstm-birnn-seq2seq-beam-luong | 0.171741 | -| 32.gru-birnn-seq2seq-beam-luong | 0.189787 | -| 33.lstm-birnn-seq2seq-luong-bahdanau-stack-beam | 0.098961 | -| 34.gru-birnn-seq2seq-luong-bahdanau-stack-beam | 0.091473 | -| 35.byte-net | 1.022409 | -| 36.estimator | | -| 37.capsule-lstm-seq2seq-greedy | | -| 38.capsule-lstm-seq2seq-luong-beam | | -| 39.lstm-birnn-seq2seq-luong-bahdanau-stack-beam-dropout-l2 | 0.066305 | -| 40.dnc-seq2seq-bahdanau-greedy | 0.711184 | -| 41.lstm-birnn-seq2seq-beam-luongmonotic | 0.624756 | -| 42.lstm-birnn-seq2seq-beam-bahdanaumonotic | 0.624756 | -| 43.memory-network-basic | 0.965700 | -| 44.memory-network-lstm | 0.942591 | -| 45.attention-is-all-you-need | 0.170279 | -| 46.transformer-xl | 0.114907 | -| 47.attention-is-all-you-need-beam-search | 0.158205 | -| 48.conv-encoder-conv-decoder | 0.462655 | -| 49.conv-encoder-lstm | 0.438702 | -| 50.byte-net-greedy.ipynb | 1.023528 | -| 51.gru-birnn-seq2seq-greedy-residual.ipynb | 0.561457 | -| 52.google-nmt.ipynb | 0.675990 | -| 53.dilated-seq2seq.ipynb | 1.023615 | +| notebook | BLEU | +|--------------------------------------------------------------|---------------| +| 1.basic-seq2seq.ipynb | 6.319555e-05 | +| 2.lstm-seq2seq.ipynb | 0.016924812 | +| 3.gru-seq2seq.ipynb | 0.0094467895 | +| 4.basic-seq2seq-contrib-greedy.ipynb | 0.005418866 | +| 5.lstm-seq2seq-contrib-greedy.ipynb | | +| 6.gru-seq2seq-contrib-greedy.ipynb | 0.051461186 | +| 7.basic-birnn-seq2seq.ipynb | 6.319555e-05 | +| 8.lstm-birnn-seq2seq.ipynb | 0.012854616 | +| 9.gru-birnn-seq2seq.ipynb | 0.0095551545 | +| 10.basic-birnn-seq2seq-contrib-greedy.ipynb | 0.019748569 | +| 11.lstm-birnn-seq2seq-contrib-greedy.ipynb | 0.052993 | +| 12.gru-birnn-seq2seq-contrib-greedy.ipynb | 0.047413725 | +| 13.basic-seq2seq-luong.ipynb | 8.97118e-05 | +| 14.lstm-seq2seq-luong.ipynb | 0.053475615 | +| 15.gru-seq2seq-luong.ipynb | 0.01888038 | +| 16.basic-seq2seq-bahdanau.ipynb | 0.00020161743 | +| 17.lstm-seq2seq-bahdanau.ipynb | 0.048261568 | +| 18.gru-seq2seq-bahdanau.ipynb | 0.025584696 | +| 19.basic-birnn-seq2seq-bahdanau.ipynb | 0.00020161743 | +| 20.lstm-birnn-seq2seq-bahdanau.ipynb | 0.054097746 | +| 21.gru-birnn-seq2seq-bahdanau.ipynb | 0.00020161743 | +| 22.basic-birnn-seq2seq-luong.ipynb | | +| 23.lstm-birnn-seq2seq-luong.ipynb | 0.05320787 | +| 24.gru-birnn-seq2seq-luong.ipynb | 0.027758315 | +| 25.lstm-seq2seq-contrib-greedy-luong.ipynb | 0.15195806 | +| 26.gru-seq2seq-contrib-greedy-luong.ipynb | 0.101576895 | +| 27.lstm-seq2seq-contrib-greedy-bahdanau.ipynb | 0.15275387 | +| 28.gru-seq2seq-contrib-greedy-bahdanau.ipynb | 0.13868862 | +| 29.lstm-seq2seq-contrib-beam-luong.ipynb | 0.17535137 | +| 30.gru-seq2seq-contrib-beam-luong.ipynb | 0.003980886 | +| 31.lstm-seq2seq-contrib-beam-bahdanau.ipynb | 0.17929372 | +| 32.gru-seq2seq-contrib-beam-bahdanau.ipynb | 0.1767827 | +| 33.lstm-birnn-seq2seq-contrib-beam-bahdanau.ipynb | 0.19480321 | +| 34.lstm-birnn-seq2seq-contrib-beam-luong.ipynb | 0.20042004 | +| 35.gru-birnn-seq2seq-contrib-beam-bahdanau.ipynb | 0.1784567 | +| 36.gru-birnn-seq2seq-contrib-beam-luong.ipynb | 0.0557322 | +| 37.lstm-birnn-seq2seq-contrib-beam-luongmonotonic.ipynb | 0.06368613 | +| 38.gru-birnn-seq2seq-contrib-beam-luongmonotic.ipynb | 0.06407658 | +| 39.lstm-birnn-seq2seq-contrib-beam-bahdanaumonotonic.ipynb | 0.17586066 | +| 40.gru-birnn-seq2seq-contrib-beam-bahdanaumonotic.ipynb | 0.065290846 | +| 41.residual-lstm-seq2seq-greedy-luong.ipynb | 0.1475228 | +| 42.residual-gru-seq2seq-greedy-luong.ipynb | 5.0574585e-05 | +| 43.residual-lstm-seq2seq-greedy-bahdanau.ipynb | 0.15493448 | +| 44.residual-gru-seq2seq-greedy-bahdanau.ipynb | | +| 45.memory-network-lstm-decoder-greedy.ipynb | | +| 46.google-nmt.ipynb | 0.055380445 | +| 47.transformer-encoder-transformer-decoder.ipynb | 0.17100729 | +| 48.transformer-encoder-lstm-decoder-greedy.ipynb | 0.049064703 | +| 49.bertmultilanguage-encoder-bertmultilanguage-decoder.ipynb | 0.37003958 | +| 50.bertmultilanguage-encoder-lstm-decoder.ipynb | 0.11384286 | +| 51.bertmultilanguage-encoder-transformer-decoder.ipynb | 0.3941662 | +| 52.bertenglish-encoder-transformer-decoder.ipynb | 0.23225775 | +| 53.transformer-t2t-2gpu.ipynb | 0.36773485 | \ No newline at end of file diff --git a/neural-machine-translation/access.py b/neural-machine-translation/access.py deleted file mode 100644 index f4a8433..0000000 --- a/neural-machine-translation/access.py +++ /dev/null @@ -1,318 +0,0 @@ -# Copyright 2017 Google Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""DNC access modules.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import sonnet as snt -import tensorflow as tf - -import addressing -import util - -AccessState = collections.namedtuple('AccessState', ( - 'memory', 'read_weights', 'write_weights', 'linkage', 'usage')) - - -def _erase_and_write(memory, address, reset_weights, values): - """Module to erase and write in the external memory. - - Erase operation: - M_t'(i) = M_{t-1}(i) * (1 - w_t(i) * e_t) - - Add operation: - M_t(i) = M_t'(i) + w_t(i) * a_t - - where e are the reset_weights, w the write weights and a the values. - - Args: - memory: 3-D tensor of shape `[batch_size, memory_size, word_size]`. - address: 3-D tensor `[batch_size, num_writes, memory_size]`. - reset_weights: 3-D tensor `[batch_size, num_writes, word_size]`. - values: 3-D tensor `[batch_size, num_writes, word_size]`. - - Returns: - 3-D tensor of shape `[batch_size, num_writes, word_size]`. - """ - with tf.name_scope('erase_memory', values=[memory, address, reset_weights]): - expand_address = tf.expand_dims(address, 3) - reset_weights = tf.expand_dims(reset_weights, 2) - weighted_resets = expand_address * reset_weights - reset_gate = tf.reduce_prod(1 - weighted_resets, [1]) - memory *= reset_gate - - with tf.name_scope('additive_write', values=[memory, address, values]): - add_matrix = tf.matmul(address, values, adjoint_a=True) - memory += add_matrix - - return memory - - -class MemoryAccess(snt.RNNCore): - """Access module of the Differentiable Neural Computer. - - This memory module supports multiple read and write heads. It makes use of: - - * `addressing.TemporalLinkage` to track the temporal ordering of writes in - memory for each write head. - * `addressing.FreenessAllocator` for keeping track of memory usage, where - usage increase when a memory location is written to, and decreases when - memory is read from that the controller says can be freed. - - Write-address selection is done by an interpolation between content-based - lookup and using unused memory. - - Read-address selection is done by an interpolation of content-based lookup - and following the link graph in the forward or backwards read direction. - """ - - def __init__(self, - memory_size=128, - word_size=20, - num_reads=1, - num_writes=1, - name='memory_access'): - """Creates a MemoryAccess module. - - Args: - memory_size: The number of memory slots (N in the DNC paper). - word_size: The width of each memory slot (W in the DNC paper) - num_reads: The number of read heads (R in the DNC paper). - num_writes: The number of write heads (fixed at 1 in the paper). - name: The name of the module. - """ - super(MemoryAccess, self).__init__(name=name) - self._memory_size = memory_size - self._word_size = word_size - self._num_reads = num_reads - self._num_writes = num_writes - - self._write_content_weights_mod = addressing.CosineWeights( - num_writes, word_size, name='write_content_weights') - self._read_content_weights_mod = addressing.CosineWeights( - num_reads, word_size, name='read_content_weights') - - self._linkage = addressing.TemporalLinkage(memory_size, num_writes) - self._freeness = addressing.Freeness(memory_size) - - def _build(self, inputs, prev_state): - """Connects the MemoryAccess module into the graph. - - Args: - inputs: tensor of shape `[batch_size, input_size]`. This is used to - control this access module. - prev_state: Instance of `AccessState` containing the previous state. - - Returns: - A tuple `(output, next_state)`, where `output` is a tensor of shape - `[batch_size, num_reads, word_size]`, and `next_state` is the new - `AccessState` named tuple at the current time t. - """ - inputs = self._read_inputs(inputs) - - # Update usage using inputs['free_gate'] and previous read & write weights. - usage = self._freeness( - write_weights=prev_state.write_weights, - free_gate=inputs['free_gate'], - read_weights=prev_state.read_weights, - prev_usage=prev_state.usage) - - # Write to memory. - write_weights = self._write_weights(inputs, prev_state.memory, usage) - memory = _erase_and_write( - prev_state.memory, - address=write_weights, - reset_weights=inputs['erase_vectors'], - values=inputs['write_vectors']) - - linkage_state = self._linkage(write_weights, prev_state.linkage) - - # Read from memory. - read_weights = self._read_weights( - inputs, - memory=memory, - prev_read_weights=prev_state.read_weights, - link=linkage_state.link) - read_words = tf.matmul(read_weights, memory) - - return (read_words, AccessState( - memory=memory, - read_weights=read_weights, - write_weights=write_weights, - linkage=linkage_state, - usage=usage)) - - def _read_inputs(self, inputs): - """Applies transformations to `inputs` to get control for this module.""" - - def _linear(first_dim, second_dim, name, activation=None): - """Returns a linear transformation of `inputs`, followed by a reshape.""" - linear = snt.Linear(first_dim * second_dim, name=name)(inputs) - if activation is not None: - linear = activation(linear, name=name + '_activation') - return tf.reshape(linear, [-1, first_dim, second_dim]) - - # v_t^i - The vectors to write to memory, for each write head `i`. - write_vectors = _linear(self._num_writes, self._word_size, 'write_vectors') - - # e_t^i - Amount to erase the memory by before writing, for each write head. - erase_vectors = _linear(self._num_writes, self._word_size, 'erase_vectors', - tf.sigmoid) - - # f_t^j - Amount that the memory at the locations read from at the previous - # time step can be declared unused, for each read head `j`. - free_gate = tf.sigmoid( - snt.Linear(self._num_reads, name='free_gate')(inputs)) - - # g_t^{a, i} - Interpolation between writing to unallocated memory and - # content-based lookup, for each write head `i`. Note: `a` is simply used to - # identify this gate with allocation vs writing (as defined below). - allocation_gate = tf.sigmoid( - snt.Linear(self._num_writes, name='allocation_gate')(inputs)) - - # g_t^{w, i} - Overall gating of write amount for each write head. - write_gate = tf.sigmoid( - snt.Linear(self._num_writes, name='write_gate')(inputs)) - - # \pi_t^j - Mixing between "backwards" and "forwards" positions (for - # each write head), and content-based lookup, for each read head. - num_read_modes = 1 + 2 * self._num_writes - read_mode = snt.BatchApply(tf.nn.softmax)( - _linear(self._num_reads, num_read_modes, name='read_mode')) - - # Parameters for the (read / write) "weights by content matching" modules. - write_keys = _linear(self._num_writes, self._word_size, 'write_keys') - write_strengths = snt.Linear(self._num_writes, name='write_strengths')( - inputs) - - read_keys = _linear(self._num_reads, self._word_size, 'read_keys') - read_strengths = snt.Linear(self._num_reads, name='read_strengths')(inputs) - - result = { - 'read_content_keys': read_keys, - 'read_content_strengths': read_strengths, - 'write_content_keys': write_keys, - 'write_content_strengths': write_strengths, - 'write_vectors': write_vectors, - 'erase_vectors': erase_vectors, - 'free_gate': free_gate, - 'allocation_gate': allocation_gate, - 'write_gate': write_gate, - 'read_mode': read_mode, - } - return result - - def _write_weights(self, inputs, memory, usage): - """Calculates the memory locations to write to. - - This uses a combination of content-based lookup and finding an unused - location in memory, for each write head. - - Args: - inputs: Collection of inputs to the access module, including controls for - how to chose memory writing, such as the content to look-up and the - weighting between content-based and allocation-based addressing. - memory: A tensor of shape `[batch_size, memory_size, word_size]` - containing the current memory contents. - usage: Current memory usage, which is a tensor of shape `[batch_size, - memory_size]`, used for allocation-based addressing. - - Returns: - tensor of shape `[batch_size, num_writes, memory_size]` indicating where - to write to (if anywhere) for each write head. - """ - with tf.name_scope('write_weights', values=[inputs, memory, usage]): - # c_t^{w, i} - The content-based weights for each write head. - write_content_weights = self._write_content_weights_mod( - memory, inputs['write_content_keys'], - inputs['write_content_strengths']) - - # a_t^i - The allocation weights for each write head. - write_allocation_weights = self._freeness.write_allocation_weights( - usage=usage, - write_gates=(inputs['allocation_gate'] * inputs['write_gate']), - num_writes=self._num_writes) - - # Expands gates over memory locations. - allocation_gate = tf.expand_dims(inputs['allocation_gate'], -1) - write_gate = tf.expand_dims(inputs['write_gate'], -1) - - # w_t^{w, i} - The write weightings for each write head. - return write_gate * (allocation_gate * write_allocation_weights + - (1 - allocation_gate) * write_content_weights) - - def _read_weights(self, inputs, memory, prev_read_weights, link): - """Calculates read weights for each read head. - - The read weights are a combination of following the link graphs in the - forward or backward directions from the previous read position, and doing - content-based lookup. The interpolation between these different modes is - done by `inputs['read_mode']`. - - Args: - inputs: Controls for this access module. This contains the content-based - keys to lookup, and the weightings for the different read modes. - memory: A tensor of shape `[batch_size, memory_size, word_size]` - containing the current memory contents to do content-based lookup. - prev_read_weights: A tensor of shape `[batch_size, num_reads, - memory_size]` containing the previous read locations. - link: A tensor of shape `[batch_size, num_writes, memory_size, - memory_size]` containing the temporal write transition graphs. - - Returns: - A tensor of shape `[batch_size, num_reads, memory_size]` containing the - read weights for each read head. - """ - with tf.name_scope( - 'read_weights', values=[inputs, memory, prev_read_weights, link]): - # c_t^{r, i} - The content weightings for each read head. - content_weights = self._read_content_weights_mod( - memory, inputs['read_content_keys'], inputs['read_content_strengths']) - - # Calculates f_t^i and b_t^i. - forward_weights = self._linkage.directional_read_weights( - link, prev_read_weights, forward=True) - backward_weights = self._linkage.directional_read_weights( - link, prev_read_weights, forward=False) - - backward_mode = inputs['read_mode'][:, :, :self._num_writes] - forward_mode = ( - inputs['read_mode'][:, :, self._num_writes:2 * self._num_writes]) - content_mode = inputs['read_mode'][:, :, 2 * self._num_writes] - - read_weights = ( - tf.expand_dims(content_mode, 2) * content_weights + tf.reduce_sum( - tf.expand_dims(forward_mode, 3) * forward_weights, 2) + - tf.reduce_sum(tf.expand_dims(backward_mode, 3) * backward_weights, 2)) - - return read_weights - - @property - def state_size(self): - """Returns a tuple of the shape of the state tensors.""" - return AccessState( - memory=tf.TensorShape([self._memory_size, self._word_size]), - read_weights=tf.TensorShape([self._num_reads, self._memory_size]), - write_weights=tf.TensorShape([self._num_writes, self._memory_size]), - linkage=self._linkage.state_size, - usage=self._freeness.state_size) - - @property - def output_size(self): - """Returns the output shape.""" - return tf.TensorShape([self._num_reads, self._word_size]) diff --git a/neural-machine-translation/addressing.py b/neural-machine-translation/addressing.py deleted file mode 100644 index 77a88e8..0000000 --- a/neural-machine-translation/addressing.py +++ /dev/null @@ -1,410 +0,0 @@ -# Copyright 2017 Google Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""DNC addressing modules.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import sonnet as snt -import tensorflow as tf - -import util - -# Ensure values are greater than epsilon to avoid numerical instability. -_EPSILON = 1e-6 - -TemporalLinkageState = collections.namedtuple('TemporalLinkageState', - ('link', 'precedence_weights')) - - -def _vector_norms(m): - squared_norms = tf.reduce_sum(m * m, axis=2, keep_dims=True) - return tf.sqrt(squared_norms + _EPSILON) - - -def weighted_softmax(activations, strengths, strengths_op): - """Returns softmax over activations multiplied by positive strengths. - - Args: - activations: A tensor of shape `[batch_size, num_heads, memory_size]`, of - activations to be transformed. Softmax is taken over the last dimension. - strengths: A tensor of shape `[batch_size, num_heads]` containing strengths to - multiply by the activations prior to the softmax. - strengths_op: An operation to transform strengths before softmax. - - Returns: - A tensor of same shape as `activations` with weighted softmax applied. - """ - transformed_strengths = tf.expand_dims(strengths_op(strengths), -1) - sharp_activations = activations * transformed_strengths - softmax = snt.BatchApply(module_or_op=tf.nn.softmax) - return softmax(sharp_activations) - - -class CosineWeights(snt.AbstractModule): - """Cosine-weighted attention. - - Calculates the cosine similarity between a query and each word in memory, then - applies a weighted softmax to return a sharp distribution. - """ - - def __init__(self, - num_heads, - word_size, - strength_op=tf.nn.softplus, - name='cosine_weights'): - """Initializes the CosineWeights module. - - Args: - num_heads: number of memory heads. - word_size: memory word size. - strength_op: operation to apply to strengths (default is tf.nn.softplus). - name: module name (default 'cosine_weights') - """ - super(CosineWeights, self).__init__(name=name) - self._num_heads = num_heads - self._word_size = word_size - self._strength_op = strength_op - - def _build(self, memory, keys, strengths): - """Connects the CosineWeights module into the graph. - - Args: - memory: A 3-D tensor of shape `[batch_size, memory_size, word_size]`. - keys: A 3-D tensor of shape `[batch_size, num_heads, word_size]`. - strengths: A 2-D tensor of shape `[batch_size, num_heads]`. - - Returns: - Weights tensor of shape `[batch_size, num_heads, memory_size]`. - """ - # Calculates the inner product between the query vector and words in memory. - dot = tf.matmul(keys, memory, adjoint_b=True) - - # Outer product to compute denominator (euclidean norm of query and memory). - memory_norms = _vector_norms(memory) - key_norms = _vector_norms(keys) - norm = tf.matmul(key_norms, memory_norms, adjoint_b=True) - - # Calculates cosine similarity between the query vector and words in memory. - similarity = dot / (norm + _EPSILON) - - return weighted_softmax(similarity, strengths, self._strength_op) - - -class TemporalLinkage(snt.RNNCore): - """Keeps track of write order for forward and backward addressing. - - This is a pseudo-RNNCore module, whose state is a pair `(link, - precedence_weights)`, where `link` is a (collection of) graphs for (possibly - multiple) write heads (represented by a tensor with values in the range - [0, 1]), and `precedence_weights` records the "previous write locations" used - to build the link graphs. - - The function `directional_read_weights` computes addresses following the - forward and backward directions in the link graphs. - """ - - def __init__(self, memory_size, num_writes, name='temporal_linkage'): - """Construct a TemporalLinkage module. - - Args: - memory_size: The number of memory slots. - num_writes: The number of write heads. - name: Name of the module. - """ - super(TemporalLinkage, self).__init__(name=name) - self._memory_size = memory_size - self._num_writes = num_writes - - def _build(self, write_weights, prev_state): - """Calculate the updated linkage state given the write weights. - - Args: - write_weights: A tensor of shape `[batch_size, num_writes, memory_size]` - containing the memory addresses of the different write heads. - prev_state: `TemporalLinkageState` tuple containg a tensor `link` of - shape `[batch_size, num_writes, memory_size, memory_size]`, and a - tensor `precedence_weights` of shape `[batch_size, num_writes, - memory_size]` containing the aggregated history of recent writes. - - Returns: - A `TemporalLinkageState` tuple `next_state`, which contains the updated - link and precedence weights. - """ - link = self._link(prev_state.link, prev_state.precedence_weights, - write_weights) - precedence_weights = self._precedence_weights(prev_state.precedence_weights, - write_weights) - return TemporalLinkageState( - link=link, precedence_weights=precedence_weights) - - def directional_read_weights(self, link, prev_read_weights, forward): - """Calculates the forward or the backward read weights. - - For each read head (at a given address), there are `num_writes` link graphs - to follow. Thus this function computes a read address for each of the - `num_reads * num_writes` pairs of read and write heads. - - Args: - link: tensor of shape `[batch_size, num_writes, memory_size, - memory_size]` representing the link graphs L_t. - prev_read_weights: tensor of shape `[batch_size, num_reads, - memory_size]` containing the previous read weights w_{t-1}^r. - forward: Boolean indicating whether to follow the "future" direction in - the link graph (True) or the "past" direction (False). - - Returns: - tensor of shape `[batch_size, num_reads, num_writes, memory_size]` - """ - with tf.name_scope('directional_read_weights'): - # We calculate the forward and backward directions for each pair of - # read and write heads; hence we need to tile the read weights and do a - # sort of "outer product" to get this. - expanded_read_weights = tf.stack([prev_read_weights] * self._num_writes, - 1) - result = tf.matmul(expanded_read_weights, link, adjoint_b=forward) - # Swap dimensions 1, 2 so order is [batch, reads, writes, memory]: - return tf.transpose(result, perm=[0, 2, 1, 3]) - - def _link(self, prev_link, prev_precedence_weights, write_weights): - """Calculates the new link graphs. - - For each write head, the link is a directed graph (represented by a matrix - with entries in range [0, 1]) whose vertices are the memory locations, and - an edge indicates temporal ordering of writes. - - Args: - prev_link: A tensor of shape `[batch_size, num_writes, memory_size, - memory_size]` representing the previous link graphs for each write - head. - prev_precedence_weights: A tensor of shape `[batch_size, num_writes, - memory_size]` which is the previous "aggregated" write weights for - each write head. - write_weights: A tensor of shape `[batch_size, num_writes, memory_size]` - containing the new locations in memory written to. - - Returns: - A tensor of shape `[batch_size, num_writes, memory_size, memory_size]` - containing the new link graphs for each write head. - """ - with tf.name_scope('link'): - batch_size = prev_link.get_shape()[0].value - write_weights_i = tf.expand_dims(write_weights, 3) - write_weights_j = tf.expand_dims(write_weights, 2) - prev_precedence_weights_j = tf.expand_dims(prev_precedence_weights, 2) - prev_link_scale = 1 - write_weights_i - write_weights_j - new_link = write_weights_i * prev_precedence_weights_j - link = prev_link_scale * prev_link + new_link - # Return the link with the diagonal set to zero, to remove self-looping - # edges. - return tf.matrix_set_diag( - link, - tf.zeros( - [batch_size, self._num_writes, self._memory_size], - dtype=link.dtype)) - - def _precedence_weights(self, prev_precedence_weights, write_weights): - """Calculates the new precedence weights given the current write weights. - - The precedence weights are the "aggregated write weights" for each write - head, where write weights with sum close to zero will leave the precedence - weights unchanged, but with sum close to one will replace the precedence - weights. - - Args: - prev_precedence_weights: A tensor of shape `[batch_size, num_writes, - memory_size]` containing the previous precedence weights. - write_weights: A tensor of shape `[batch_size, num_writes, memory_size]` - containing the new write weights. - - Returns: - A tensor of shape `[batch_size, num_writes, memory_size]` containing the - new precedence weights. - """ - with tf.name_scope('precedence_weights'): - write_sum = tf.reduce_sum(write_weights, 2, keep_dims=True) - return (1 - write_sum) * prev_precedence_weights + write_weights - - @property - def state_size(self): - """Returns a `TemporalLinkageState` tuple of the state tensors' shapes.""" - return TemporalLinkageState( - link=tf.TensorShape( - [self._num_writes, self._memory_size, self._memory_size]), - precedence_weights=tf.TensorShape([self._num_writes, - self._memory_size]),) - - -class Freeness(snt.RNNCore): - """Memory usage that is increased by writing and decreased by reading. - - This module is a pseudo-RNNCore whose state is a tensor with values in - the range [0, 1] indicating the usage of each of `memory_size` memory slots. - - The usage is: - - * Increased by writing, where usage is increased towards 1 at the write - addresses. - * Decreased by reading, where usage is decreased after reading from a - location when free_gate is close to 1. - - The function `write_allocation_weights` can be invoked to get free locations - to write to for a number of write heads. - """ - - def __init__(self, memory_size, name='freeness'): - """Creates a Freeness module. - - Args: - memory_size: Number of memory slots. - name: Name of the module. - """ - super(Freeness, self).__init__(name=name) - self._memory_size = memory_size - - def _build(self, write_weights, free_gate, read_weights, prev_usage): - """Calculates the new memory usage u_t. - - Memory that was written to in the previous time step will have its usage - increased; memory that was read from and the controller says can be "freed" - will have its usage decreased. - - Args: - write_weights: tensor of shape `[batch_size, num_writes, - memory_size]` giving write weights at previous time step. - free_gate: tensor of shape `[batch_size, num_reads]` which indicates - which read heads read memory that can now be freed. - read_weights: tensor of shape `[batch_size, num_reads, - memory_size]` giving read weights at previous time step. - prev_usage: tensor of shape `[batch_size, memory_size]` giving - usage u_{t - 1} at the previous time step, with entries in range - [0, 1]. - - Returns: - tensor of shape `[batch_size, memory_size]` representing updated memory - usage. - """ - # Calculation of usage is not differentiable with respect to write weights. - write_weights = tf.stop_gradient(write_weights) - usage = self._usage_after_write(prev_usage, write_weights) - usage = self._usage_after_read(usage, free_gate, read_weights) - return usage - - def write_allocation_weights(self, usage, write_gates, num_writes): - """Calculates freeness-based locations for writing to. - - This finds unused memory by ranking the memory locations by usage, for each - write head. (For more than one write head, we use a "simulated new usage" - which takes into account the fact that the previous write head will increase - the usage in that area of the memory.) - - Args: - usage: A tensor of shape `[batch_size, memory_size]` representing - current memory usage. - write_gates: A tensor of shape `[batch_size, num_writes]` with values in - the range [0, 1] indicating how much each write head does writing - based on the address returned here (and hence how much usage - increases). - num_writes: The number of write heads to calculate write weights for. - - Returns: - tensor of shape `[batch_size, num_writes, memory_size]` containing the - freeness-based write locations. Note that this isn't scaled by - `write_gate`; this scaling must be applied externally. - """ - with tf.name_scope('write_allocation_weights'): - # expand gatings over memory locations - write_gates = tf.expand_dims(write_gates, -1) - - allocation_weights = [] - for i in range(num_writes): - allocation_weights.append(self._allocation(usage)) - # update usage to take into account writing to this new allocation - usage += ((1 - usage) * write_gates[:, i, :] * allocation_weights[i]) - - # Pack the allocation weights for the write heads into one tensor. - return tf.stack(allocation_weights, axis=1) - - def _usage_after_write(self, prev_usage, write_weights): - """Calcualtes the new usage after writing to memory. - - Args: - prev_usage: tensor of shape `[batch_size, memory_size]`. - write_weights: tensor of shape `[batch_size, num_writes, memory_size]`. - - Returns: - New usage, a tensor of shape `[batch_size, memory_size]`. - """ - with tf.name_scope('usage_after_write'): - # Calculate the aggregated effect of all write heads - write_weights = 1 - tf.reduce_prod(1 - write_weights, [1]) - return prev_usage + (1 - prev_usage) * write_weights - - def _usage_after_read(self, prev_usage, free_gate, read_weights): - """Calcualtes the new usage after reading and freeing from memory. - - Args: - prev_usage: tensor of shape `[batch_size, memory_size]`. - free_gate: tensor of shape `[batch_size, num_reads]` with entries in the - range [0, 1] indicating the amount that locations read from can be - freed. - read_weights: tensor of shape `[batch_size, num_reads, memory_size]`. - - Returns: - New usage, a tensor of shape `[batch_size, memory_size]`. - """ - with tf.name_scope('usage_after_read'): - free_gate = tf.expand_dims(free_gate, -1) - free_read_weights = free_gate * read_weights - phi = tf.reduce_prod(1 - free_read_weights, [1], name='phi') - return prev_usage * phi - - def _allocation(self, usage): - r"""Computes allocation by sorting `usage`. - - This corresponds to the value a = a_t[\phi_t[j]] in the paper. - - Args: - usage: tensor of shape `[batch_size, memory_size]` indicating current - memory usage. This is equal to u_t in the paper when we only have one - write head, but for multiple write heads, one should update the usage - while iterating through the write heads to take into account the - allocation returned by this function. - - Returns: - Tensor of shape `[batch_size, memory_size]` corresponding to allocation. - """ - with tf.name_scope('allocation'): - # Ensure values are not too small prior to cumprod. - usage = _EPSILON + (1 - _EPSILON) * usage - - nonusage = 1 - usage - sorted_nonusage, indices = tf.nn.top_k( - nonusage, k=self._memory_size, name='sort') - sorted_usage = 1 - sorted_nonusage - prod_sorted_usage = tf.cumprod(sorted_usage, axis=1, exclusive=True) - sorted_allocation = sorted_nonusage * prod_sorted_usage - inverse_indices = util.batch_invert_permutation(indices) - - # This final line "unsorts" sorted_allocation, so that the indexing - # corresponds to the original indexing of `usage`. - return util.batch_gather(sorted_allocation, inverse_indices) - - @property - def state_size(self): - """Returns the shape of the state tensor.""" - return tf.TensorShape([self._memory_size]) diff --git a/neural-machine-translation/bert_decoder.py b/neural-machine-translation/bert_decoder.py new file mode 100644 index 0000000..08269d8 --- /dev/null +++ b/neural-machine-translation/bert_decoder.py @@ -0,0 +1,1080 @@ +# coding=utf-8 +# Copyright 2018 The Google AI Language Team Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""The main BERT model and related functions.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import copy +import json +import math +import re +import numpy as np +import six +import tensorflow as tf + + +class BertConfig(object): + """Configuration for `BertModel`.""" + + def __init__( + self, + vocab_size, + hidden_size = 768, + num_hidden_layers = 12, + num_attention_heads = 12, + intermediate_size = 3072, + hidden_act = 'gelu', + hidden_dropout_prob = 0.1, + attention_probs_dropout_prob = 0.1, + max_position_embeddings = 512, + type_vocab_size = 16, + initializer_range = 0.02, + ): + """Constructs BertConfig. + + Args: + vocab_size: Vocabulary size of `inputs_ids` in `BertModel`. + hidden_size: Size of the encoder layers and the pooler layer. + num_hidden_layers: Number of hidden layers in the Transformer encoder. + num_attention_heads: Number of attention heads for each attention layer in + the Transformer encoder. + intermediate_size: The size of the "intermediate" (i.e., feed-forward) + layer in the Transformer encoder. + hidden_act: The non-linear activation function (function or string) in the + encoder and pooler. + hidden_dropout_prob: The dropout probability for all fully connected + layers in the embeddings, encoder, and pooler. + attention_probs_dropout_prob: The dropout ratio for the attention + probabilities. + max_position_embeddings: The maximum sequence length that this model might + ever be used with. Typically set this to something large just in case + (e.g., 512 or 1024 or 2048). + type_vocab_size: The vocabulary size of the `token_type_ids` passed into + `BertModel`. + initializer_range: The stdev of the truncated_normal_initializer for + initializing all weight matrices. + """ + self.vocab_size = vocab_size + self.hidden_size = hidden_size + self.num_hidden_layers = num_hidden_layers + self.num_attention_heads = num_attention_heads + self.hidden_act = hidden_act + self.intermediate_size = intermediate_size + self.hidden_dropout_prob = hidden_dropout_prob + self.attention_probs_dropout_prob = attention_probs_dropout_prob + self.max_position_embeddings = max_position_embeddings + self.type_vocab_size = type_vocab_size + self.initializer_range = initializer_range + + @classmethod + def from_dict(cls, json_object): + """Constructs a `BertConfig` from a Python dictionary of parameters.""" + config = BertConfig(vocab_size = None) + for (key, value) in six.iteritems(json_object): + config.__dict__[key] = value + return config + + @classmethod + def from_json_file(cls, json_file): + """Constructs a `BertConfig` from a json file of parameters.""" + with tf.gfile.GFile(json_file, 'r') as reader: + text = reader.read() + return cls.from_dict(json.loads(text)) + + def to_dict(self): + """Serializes this instance to a Python dictionary.""" + output = copy.deepcopy(self.__dict__) + return output + + def to_json_string(self): + """Serializes this instance to a JSON string.""" + return json.dumps(self.to_dict(), indent = 2, sort_keys = True) + '\n' + + +class BertModel(object): + """BERT model ("Bidirectional Encoder Representations from Transformers"). + + Example usage: + + ```python + # Already been converted into WordPiece token ids + input_ids = tf.constant([[31, 51, 99], [15, 5, 0]]) + input_mask = tf.constant([[1, 1, 1], [1, 1, 0]]) + token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]]) + + config = modeling.BertConfig(vocab_size=32000, hidden_size=512, + num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024) + + model = modeling.BertModel(config=config, is_training=True, + input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids) + + label_embeddings = tf.get_variable(...) + pooled_output = model.get_pooled_output() + logits = tf.matmul(pooled_output, label_embeddings) + ... + ``` + """ + + def __init__( + self, + config, + is_training, + input_ids, + memory, + memory_mask, + input_mask = None, + token_type_ids = None, + use_one_hot_embeddings = False, + scope = None, + ): + """Constructor for BertModel. + + Args: + config: `BertConfig` instance. + is_training: bool. true for training model, false for eval model. Controls + whether dropout will be applied. + input_ids: int32 Tensor of shape [batch_size, seq_length]. + input_mask: (optional) int32 Tensor of shape [batch_size, seq_length]. + token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length]. + use_one_hot_embeddings: (optional) bool. Whether to use one-hot word + embeddings or tf.embedding_lookup() for the word embeddings. + scope: (optional) variable scope. Defaults to "bert". + + Raises: + ValueError: The config is invalid or one of the input tensor shapes + is invalid. + """ + config = copy.deepcopy(config) + if not is_training: + config.hidden_dropout_prob = 0.0 + config.attention_probs_dropout_prob = 0.0 + + input_shape = get_shape_list(input_ids, expected_rank = 2) + batch_size = input_shape[0] + seq_length = input_shape[1] + + if input_mask is None: + input_mask = tf.ones( + shape = [batch_size, seq_length], dtype = tf.int32 + ) + + if token_type_ids is None: + token_type_ids = tf.zeros( + shape = [batch_size, seq_length], dtype = tf.int32 + ) + + with tf.variable_scope(scope, default_name = 'bert'): + with tf.variable_scope('embeddings'): + # Perform embedding lookup on the word ids. + ( + self.embedding_output, + self.embedding_table, + ) = embedding_lookup( + input_ids = input_ids, + vocab_size = config.vocab_size, + embedding_size = config.hidden_size, + initializer_range = config.initializer_range, + word_embedding_name = 'word_embeddings', + use_one_hot_embeddings = use_one_hot_embeddings, + ) + + # Add positional embeddings and token type embeddings, then layer + # normalize and perform dropout. + self.embedding_output = embedding_postprocessor( + input_tensor = self.embedding_output, + use_token_type = True, + token_type_ids = token_type_ids, + token_type_vocab_size = config.type_vocab_size, + token_type_embedding_name = 'token_type_embeddings', + use_position_embeddings = True, + position_embedding_name = 'position_embeddings', + initializer_range = config.initializer_range, + max_position_embeddings = config.max_position_embeddings, + dropout_prob = config.hidden_dropout_prob, + ) + + with tf.variable_scope('encoder'): + # This converts a 2D mask of shape [batch_size, seq_length] to a 3D + # mask of shape [batch_size, seq_length, seq_length] which is used + # for the attention scores. + attention_mask = create_attention_mask_from_input_mask( + input_ids, memory_mask + ) + + # Run the stacked transformer. + # `sequence_output` shape = [batch_size, seq_length, hidden_size]. + self.all_encoder_layers = transformer_model( + input_tensor = self.embedding_output, + memory = memory, + attention_mask = attention_mask, + hidden_size = config.hidden_size, + num_hidden_layers = config.num_hidden_layers, + num_attention_heads = config.num_attention_heads, + intermediate_size = config.intermediate_size, + intermediate_act_fn = get_activation(config.hidden_act), + hidden_dropout_prob = config.hidden_dropout_prob, + attention_probs_dropout_prob = config.attention_probs_dropout_prob, + initializer_range = config.initializer_range, + do_return_all_layers = True, + ) + + self.sequence_output = self.all_encoder_layers[-1] + # The "pooler" converts the encoded sequence tensor of shape + # [batch_size, seq_length, hidden_size] to a tensor of shape + # [batch_size, hidden_size]. This is necessary for segment-level + # (or segment-pair-level) classification tasks where we need a fixed + # dimensional representation of the segment. + with tf.variable_scope('pooler'): + # We "pool" the model by simply taking the hidden state corresponding + # to the first token. We assume that this has been pre-trained + first_token_tensor = tf.squeeze( + self.sequence_output[:, 0:1, :], axis = 1 + ) + self.pooled_output = tf.layers.dense( + first_token_tensor, + config.hidden_size, + activation = tf.tanh, + kernel_initializer = create_initializer( + config.initializer_range + ), + ) + + def get_pooled_output(self): + return self.pooled_output + + def get_sequence_output(self): + """Gets final hidden layer of encoder. + + Returns: + float Tensor of shape [batch_size, seq_length, hidden_size] corresponding + to the final hidden of the transformer encoder. + """ + return self.sequence_output + + def get_all_encoder_layers(self): + return self.all_encoder_layers + + def get_embedding_output(self): + """Gets output of the embedding lookup (i.e., input to the transformer). + + Returns: + float Tensor of shape [batch_size, seq_length, hidden_size] corresponding + to the output of the embedding layer, after summing the word + embeddings with the positional embeddings and the token type embeddings, + then performing layer normalization. This is the input to the transformer. + """ + return self.embedding_output + + def get_embedding_table(self): + return self.embedding_table + + +def gelu(x): + """Gaussian Error Linear Unit. + + This is a smoother version of the RELU. + Original paper: https://arxiv.org/abs/1606.08415 + Args: + x: float Tensor to perform activation. + + Returns: + `x` with the GELU activation applied. + """ + cdf = 0.5 * ( + 1.0 + tf.tanh((np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))) + ) + return x * cdf + + +def get_activation(activation_string): + """Maps a string to a Python function, e.g., "relu" => `tf.nn.relu`. + + Args: + activation_string: String name of the activation function. + + Returns: + A Python function corresponding to the activation function. If + `activation_string` is None, empty, or "linear", this will return None. + If `activation_string` is not a string, it will return `activation_string`. + + Raises: + ValueError: The `activation_string` does not correspond to a known + activation. + """ + + # We assume that anything that"s not a string is already an activation + # function, so we just return it. + if not isinstance(activation_string, six.string_types): + return activation_string + + if not activation_string: + return None + + act = activation_string.lower() + if act == 'linear': + return None + elif act == 'relu': + return tf.nn.relu + elif act == 'gelu': + return gelu + elif act == 'tanh': + return tf.tanh + else: + raise ValueError('Unsupported activation: %s' % act) + + +def get_assignment_map_from_checkpoint(tvars, init_checkpoint): + """Compute the union of the current variables and checkpoint variables.""" + assignment_map = {} + initialized_variable_names = {} + + name_to_variable = collections.OrderedDict() + for var in tvars: + name = var.name + m = re.match('^(.*):\\d+$', name) + if m is not None: + name = m.group(1) + name_to_variable[name] = var + + init_vars = tf.train.list_variables(init_checkpoint) + + assignment_map = collections.OrderedDict() + for x in init_vars: + (name, var) = (x[0], x[1]) + if name not in name_to_variable: + continue + assignment_map[name] = name + initialized_variable_names[name] = 1 + initialized_variable_names[name + ':0'] = 1 + + return (assignment_map, initialized_variable_names) + + +def dropout(input_tensor, dropout_prob): + """Perform dropout. + + Args: + input_tensor: float Tensor. + dropout_prob: Python float. The probability of dropping out a value (NOT of + *keeping* a dimension as in `tf.nn.dropout`). + + Returns: + A version of `input_tensor` with dropout applied. + """ + if dropout_prob is None or dropout_prob == 0.0: + return input_tensor + + output = tf.nn.dropout(input_tensor, 1.0 - dropout_prob) + return output + + +def layer_norm(input_tensor, name = None): + """Run layer normalization on the last dimension of the tensor.""" + return tf.contrib.layers.layer_norm( + inputs = input_tensor, + begin_norm_axis = -1, + begin_params_axis = -1, + scope = name, + ) + + +def layer_norm_and_dropout(input_tensor, dropout_prob, name = None): + """Runs layer normalization followed by dropout.""" + output_tensor = layer_norm(input_tensor, name) + output_tensor = dropout(output_tensor, dropout_prob) + return output_tensor + + +def create_initializer(initializer_range = 0.02): + """Creates a `truncated_normal_initializer` with the given range.""" + return tf.truncated_normal_initializer(stddev = initializer_range) + + +def embedding_lookup( + input_ids, + vocab_size, + embedding_size = 128, + initializer_range = 0.02, + word_embedding_name = 'word_embeddings', + use_one_hot_embeddings = False, +): + """Looks up words embeddings for id tensor. + + Args: + input_ids: int32 Tensor of shape [batch_size, seq_length] containing word + ids. + vocab_size: int. Size of the embedding vocabulary. + embedding_size: int. Width of the word embeddings. + initializer_range: float. Embedding initialization range. + word_embedding_name: string. Name of the embedding table. + use_one_hot_embeddings: bool. If True, use one-hot method for word + embeddings. If False, use `tf.gather()`. + + Returns: + float Tensor of shape [batch_size, seq_length, embedding_size]. + """ + # This function assumes that the input is of shape [batch_size, seq_length, + # num_inputs]. + # + # If the input is a 2D tensor of shape [batch_size, seq_length], we + # reshape to [batch_size, seq_length, 1]. + if input_ids.shape.ndims == 2: + input_ids = tf.expand_dims(input_ids, axis = [-1]) + + embedding_table = tf.get_variable( + name = word_embedding_name, + shape = [vocab_size, embedding_size], + initializer = create_initializer(initializer_range), + ) + + flat_input_ids = tf.reshape(input_ids, [-1]) + if use_one_hot_embeddings: + one_hot_input_ids = tf.one_hot(flat_input_ids, depth = vocab_size) + output = tf.matmul(one_hot_input_ids, embedding_table) + else: + output = tf.gather(embedding_table, flat_input_ids) + + input_shape = get_shape_list(input_ids) + + output = tf.reshape( + output, input_shape[0:-1] + [input_shape[-1] * embedding_size] + ) + return (output, embedding_table) + + +def embedding_postprocessor( + input_tensor, + use_token_type = False, + token_type_ids = None, + token_type_vocab_size = 16, + token_type_embedding_name = 'token_type_embeddings', + use_position_embeddings = True, + position_embedding_name = 'position_embeddings', + initializer_range = 0.02, + max_position_embeddings = 512, + dropout_prob = 0.1, +): + """Performs various post-processing on a word embedding tensor. + + Args: + input_tensor: float Tensor of shape [batch_size, seq_length, + embedding_size]. + use_token_type: bool. Whether to add embeddings for `token_type_ids`. + token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length]. + Must be specified if `use_token_type` is True. + token_type_vocab_size: int. The vocabulary size of `token_type_ids`. + token_type_embedding_name: string. The name of the embedding table variable + for token type ids. + use_position_embeddings: bool. Whether to add position embeddings for the + position of each token in the sequence. + position_embedding_name: string. The name of the embedding table variable + for positional embeddings. + initializer_range: float. Range of the weight initialization. + max_position_embeddings: int. Maximum sequence length that might ever be + used with this model. This can be longer than the sequence length of + input_tensor, but cannot be shorter. + dropout_prob: float. Dropout probability applied to the final output tensor. + + Returns: + float tensor with same shape as `input_tensor`. + + Raises: + ValueError: One of the tensor shapes or input values is invalid. + """ + input_shape = get_shape_list(input_tensor, expected_rank = 3) + batch_size = input_shape[0] + seq_length = input_shape[1] + width = input_shape[2] + + output = input_tensor + + if use_token_type: + if token_type_ids is None: + raise ValueError( + '`token_type_ids` must be specified if' + '`use_token_type` is True.' + ) + token_type_table = tf.get_variable( + name = token_type_embedding_name, + shape = [token_type_vocab_size, width], + initializer = create_initializer(initializer_range), + ) + # This vocab will be small so we always do one-hot here, since it is always + # faster for a small vocabulary. + flat_token_type_ids = tf.reshape(token_type_ids, [-1]) + one_hot_ids = tf.one_hot( + flat_token_type_ids, depth = token_type_vocab_size + ) + token_type_embeddings = tf.matmul(one_hot_ids, token_type_table) + token_type_embeddings = tf.reshape( + token_type_embeddings, [batch_size, seq_length, width] + ) + output += token_type_embeddings + + if use_position_embeddings: + assert_op = tf.assert_less_equal(seq_length, max_position_embeddings) + with tf.control_dependencies([assert_op]): + full_position_embeddings = tf.get_variable( + name = position_embedding_name, + shape = [max_position_embeddings, width], + initializer = create_initializer(initializer_range), + ) + # Since the position embedding table is a learned variable, we create it + # using a (long) sequence length `max_position_embeddings`. The actual + # sequence length might be shorter than this, for faster training of + # tasks that do not have long sequences. + # + # So `full_position_embeddings` is effectively an embedding table + # for position [0, 1, 2, ..., max_position_embeddings-1], and the current + # sequence has positions [0, 1, 2, ... seq_length-1], so we can just + # perform a slice. + position_embeddings = tf.slice( + full_position_embeddings, [0, 0], [seq_length, -1] + ) + num_dims = len(output.shape.as_list()) + + # Only the last two dimensions are relevant (`seq_length` and `width`), so + # we broadcast among the first dimensions, which is typically just + # the batch size. + position_broadcast_shape = [] + for _ in range(num_dims - 2): + position_broadcast_shape.append(1) + position_broadcast_shape.extend([seq_length, width]) + position_embeddings = tf.reshape( + position_embeddings, position_broadcast_shape + ) + output += position_embeddings + + output = layer_norm_and_dropout(output, dropout_prob) + return output + + +def create_attention_mask_from_input_mask(from_tensor, to_mask): + """Create 3D attention mask from a 2D tensor mask. + + Args: + from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...]. + to_mask: int32 Tensor of shape [batch_size, to_seq_length]. + + Returns: + float Tensor of shape [batch_size, from_seq_length, to_seq_length]. + """ + from_shape = get_shape_list(from_tensor, expected_rank = [2, 3]) + batch_size = from_shape[0] + from_seq_length = from_shape[1] + + to_shape = get_shape_list(to_mask, expected_rank = 2) + to_seq_length = to_shape[1] + + to_mask = tf.cast( + tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32 + ) + + # We don't assume that `from_tensor` is a mask (although it could be). We + # don't actually care if we attend *from* padding tokens (only *to* padding) + # tokens so we create a tensor of all ones. + # + # `broadcast_ones` = [batch_size, from_seq_length, 1] + broadcast_ones = tf.ones( + shape = [batch_size, from_seq_length, 1], dtype = tf.float32 + ) + + # Here we broadcast along two dimensions to create the mask. + mask = broadcast_ones * to_mask + + return mask + + +def attention_layer( + from_tensor, + to_tensor, + attention_mask = None, + num_attention_heads = 1, + size_per_head = 512, + query_act = None, + key_act = None, + value_act = None, + attention_probs_dropout_prob = 0.0, + initializer_range = 0.02, + do_return_2d_tensor = False, + batch_size = None, + from_seq_length = None, + to_seq_length = None, +): + """Performs multi-headed attention from `from_tensor` to `to_tensor`. + + This is an implementation of multi-headed attention based on "Attention + is all you Need". If `from_tensor` and `to_tensor` are the same, then + this is self-attention. Each timestep in `from_tensor` attends to the + corresponding sequence in `to_tensor`, and returns a fixed-with vector. + + This function first projects `from_tensor` into a "query" tensor and + `to_tensor` into "key" and "value" tensors. These are (effectively) a list + of tensors of length `num_attention_heads`, where each tensor is of shape + [batch_size, seq_length, size_per_head]. + + Then, the query and key tensors are dot-producted and scaled. These are + softmaxed to obtain attention probabilities. The value tensors are then + interpolated by these probabilities, then concatenated back to a single + tensor and returned. + + In practice, the multi-headed attention are done with transposes and + reshapes rather than actual separate tensors. + + Args: + from_tensor: float Tensor of shape [batch_size, from_seq_length, + from_width]. + to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width]. + attention_mask: (optional) int32 Tensor of shape [batch_size, + from_seq_length, to_seq_length]. The values should be 1 or 0. The + attention scores will effectively be set to -infinity for any positions in + the mask that are 0, and will be unchanged for positions that are 1. + num_attention_heads: int. Number of attention heads. + size_per_head: int. Size of each attention head. + query_act: (optional) Activation function for the query transform. + key_act: (optional) Activation function for the key transform. + value_act: (optional) Activation function for the value transform. + attention_probs_dropout_prob: (optional) float. Dropout probability of the + attention probabilities. + initializer_range: float. Range of the weight initializer. + do_return_2d_tensor: bool. If True, the output will be of shape [batch_size + * from_seq_length, num_attention_heads * size_per_head]. If False, the + output will be of shape [batch_size, from_seq_length, num_attention_heads + * size_per_head]. + batch_size: (Optional) int. If the input is 2D, this might be the batch size + of the 3D version of the `from_tensor` and `to_tensor`. + from_seq_length: (Optional) If the input is 2D, this might be the seq length + of the 3D version of the `from_tensor`. + to_seq_length: (Optional) If the input is 2D, this might be the seq length + of the 3D version of the `to_tensor`. + + Returns: + float Tensor of shape [batch_size, from_seq_length, + num_attention_heads * size_per_head]. (If `do_return_2d_tensor` is + true, this will be of shape [batch_size * from_seq_length, + num_attention_heads * size_per_head]). + + Raises: + ValueError: Any of the arguments or tensor shapes are invalid. + """ + + def transpose_for_scores( + input_tensor, batch_size, num_attention_heads, seq_length, width + ): + output_tensor = tf.reshape( + input_tensor, [batch_size, seq_length, num_attention_heads, width] + ) + + output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3]) + return output_tensor + + from_shape = get_shape_list(from_tensor, expected_rank = [2, 3]) + to_shape = get_shape_list(to_tensor, expected_rank = [2, 3]) + + if len(from_shape) != len(to_shape): + raise ValueError( + 'The rank of `from_tensor` must match the rank of `to_tensor`.' + ) + + if len(from_shape) == 3: + batch_size = from_shape[0] + from_seq_length = from_shape[1] + to_seq_length = to_shape[1] + elif len(from_shape) == 2: + if ( + batch_size is None + or from_seq_length is None + or to_seq_length is None + ): + raise ValueError( + 'When passing in rank 2 tensors to attention_layer, the values ' + 'for `batch_size`, `from_seq_length`, and `to_seq_length` ' + 'must all be specified.' + ) + + # Scalar dimensions referenced here: + # B = batch size (number of sequences) + # F = `from_tensor` sequence length + # T = `to_tensor` sequence length + # N = `num_attention_heads` + # H = `size_per_head` + + from_tensor_2d = reshape_to_matrix(from_tensor) + to_tensor_2d = reshape_to_matrix(to_tensor) + + # `query_layer` = [B*F, N*H] + query_layer = tf.layers.dense( + from_tensor_2d, + num_attention_heads * size_per_head, + activation = query_act, + name = 'query', + kernel_initializer = create_initializer(initializer_range), + ) + + # `key_layer` = [B*T, N*H] + key_layer = tf.layers.dense( + to_tensor_2d, + num_attention_heads * size_per_head, + activation = key_act, + name = 'key', + kernel_initializer = create_initializer(initializer_range), + ) + + # `value_layer` = [B*T, N*H] + value_layer = tf.layers.dense( + to_tensor_2d, + num_attention_heads * size_per_head, + activation = value_act, + name = 'value', + kernel_initializer = create_initializer(initializer_range), + ) + + # `query_layer` = [B, N, F, H] + query_layer = transpose_for_scores( + query_layer, + batch_size, + num_attention_heads, + from_seq_length, + size_per_head, + ) + + # `key_layer` = [B, N, T, H] + key_layer = transpose_for_scores( + key_layer, batch_size, num_attention_heads, to_seq_length, size_per_head + ) + + # Take the dot product between "query" and "key" to get the raw + # attention scores. + # `attention_scores` = [B, N, F, T] + attention_scores = tf.matmul(query_layer, key_layer, transpose_b = True) + attention_scores = tf.multiply( + attention_scores, 1.0 / math.sqrt(float(size_per_head)) + ) + + if attention_mask is not None: + # `attention_mask` = [B, 1, F, T] + attention_mask = tf.expand_dims(attention_mask, axis = [1]) + + # Since attention_mask is 1.0 for positions we want to attend and 0.0 for + # masked positions, this operation will create a tensor which is 0.0 for + # positions we want to attend and -10000.0 for masked positions. + adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0 + + # Since we are adding it to the raw scores before the softmax, this is + # effectively the same as removing these entirely. + attention_scores += adder + + # Normalize the attention scores to probabilities. + # `attention_probs` = [B, N, F, T] + attention_probs = tf.nn.softmax(attention_scores) + + # This is actually dropping out entire tokens to attend to, which might + # seem a bit unusual, but is taken from the original Transformer paper. + attention_probs = dropout(attention_probs, attention_probs_dropout_prob) + + # `value_layer` = [B, T, N, H] + value_layer = tf.reshape( + value_layer, + [batch_size, to_seq_length, num_attention_heads, size_per_head], + ) + + # `value_layer` = [B, N, T, H] + value_layer = tf.transpose(value_layer, [0, 2, 1, 3]) + + # `context_layer` = [B, N, F, H] + context_layer = tf.matmul(attention_probs, value_layer) + + # `context_layer` = [B, F, N, H] + context_layer = tf.transpose(context_layer, [0, 2, 1, 3]) + + if do_return_2d_tensor: + # `context_layer` = [B*F, N*H] + context_layer = tf.reshape( + context_layer, + [batch_size * from_seq_length, num_attention_heads * size_per_head], + ) + else: + # `context_layer` = [B, F, N*H] + context_layer = tf.reshape( + context_layer, + [batch_size, from_seq_length, num_attention_heads * size_per_head], + ) + + return context_layer + + +def transformer_model( + input_tensor, + memory, + attention_mask = None, + hidden_size = 768, + num_hidden_layers = 12, + num_attention_heads = 12, + intermediate_size = 3072, + intermediate_act_fn = gelu, + hidden_dropout_prob = 0.1, + attention_probs_dropout_prob = 0.1, + initializer_range = 0.02, + do_return_all_layers = False, +): + """Multi-headed, multi-layer Transformer from "Attention is All You Need". + + This is almost an exact implementation of the original Transformer encoder. + + See the original paper: + https://arxiv.org/abs/1706.03762 + + Also see: + https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py + + Args: + input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size]. + attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length, + seq_length], with 1 for positions that can be attended to and 0 in + positions that should not be. + hidden_size: int. Hidden size of the Transformer. + num_hidden_layers: int. Number of layers (blocks) in the Transformer. + num_attention_heads: int. Number of attention heads in the Transformer. + intermediate_size: int. The size of the "intermediate" (a.k.a., feed + forward) layer. + intermediate_act_fn: function. The non-linear activation function to apply + to the output of the intermediate/feed-forward layer. + hidden_dropout_prob: float. Dropout probability for the hidden layers. + attention_probs_dropout_prob: float. Dropout probability of the attention + probabilities. + initializer_range: float. Range of the initializer (stddev of truncated + normal). + do_return_all_layers: Whether to also return all layers or just the final + layer. + + Returns: + float Tensor of shape [batch_size, seq_length, hidden_size], the final + hidden layer of the Transformer. + + Raises: + ValueError: A Tensor shape or parameter is invalid. + """ + if hidden_size % num_attention_heads != 0: + raise ValueError( + 'The hidden size (%d) is not a multiple of the number of attention ' + 'heads (%d)' % (hidden_size, num_attention_heads) + ) + + attention_head_size = int(hidden_size / num_attention_heads) + input_shape = get_shape_list(input_tensor, expected_rank = 3) + batch_size = input_shape[0] + seq_length = input_shape[1] + memory_length = tf.shape(memory)[1] + input_width = input_shape[2] + + # The Transformer performs sum residuals on all layers so the input needs + # to be the same as the hidden size. + if input_width != hidden_size: + raise ValueError( + 'The width of the input tensor (%d) != hidden size (%d)' + % (input_width, hidden_size) + ) + + # We keep the representation as a 2D tensor to avoid re-shaping it back and + # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on + # the GPU/CPU but may not be free on the TPU, so we want to minimize them to + # help the optimizer. + prev_output = reshape_to_matrix(input_tensor) + prev_memory = reshape_to_matrix(memory) + + all_layer_outputs = [] + for layer_idx in range(num_hidden_layers): + with tf.variable_scope('layer_%d' % layer_idx): + layer_input = prev_output + + with tf.variable_scope('attention'): + attention_heads = [] + with tf.variable_scope('self'): + attention_head = attention_layer( + from_tensor = layer_input, + to_tensor = prev_memory, + attention_mask = attention_mask, + num_attention_heads = num_attention_heads, + size_per_head = attention_head_size, + attention_probs_dropout_prob = attention_probs_dropout_prob, + initializer_range = initializer_range, + do_return_2d_tensor = True, + batch_size = batch_size, + from_seq_length = seq_length, + to_seq_length = memory_length, + ) + attention_heads.append(attention_head) + + attention_output = None + if len(attention_heads) == 1: + attention_output = attention_heads[0] + else: + # In the case where we have other sequences, we just concatenate + # them to the self-attention head before the projection. + attention_output = tf.concat(attention_heads, axis = -1) + + # Run a linear projection of `hidden_size` then add a residual + # with `layer_input`. + with tf.variable_scope('output'): + attention_output = tf.layers.dense( + attention_output, + hidden_size, + kernel_initializer = create_initializer( + initializer_range + ), + ) + attention_output = dropout( + attention_output, hidden_dropout_prob + ) + attention_output = layer_norm( + attention_output + layer_input + ) + + # The activation is only applied to the "intermediate" hidden layer. + with tf.variable_scope('intermediate'): + intermediate_output = tf.layers.dense( + attention_output, + intermediate_size, + activation = intermediate_act_fn, + kernel_initializer = create_initializer(initializer_range), + ) + + # Down-project back to `hidden_size` then add the residual. + with tf.variable_scope('output'): + layer_output = tf.layers.dense( + intermediate_output, + hidden_size, + kernel_initializer = create_initializer(initializer_range), + ) + layer_output = dropout(layer_output, hidden_dropout_prob) + layer_output = layer_norm(layer_output + attention_output) + prev_output = layer_output + all_layer_outputs.append(layer_output) + + if do_return_all_layers: + final_outputs = [] + for layer_output in all_layer_outputs: + final_output = reshape_from_matrix(layer_output, input_shape) + final_outputs.append(final_output) + return final_outputs + else: + final_output = reshape_from_matrix(prev_output, input_shape) + return final_output + + +def get_shape_list(tensor, expected_rank = None, name = None): + """Returns a list of the shape of tensor, preferring static dimensions. + + Args: + tensor: A tf.Tensor object to find the shape of. + expected_rank: (optional) int. The expected rank of `tensor`. If this is + specified and the `tensor` has a different rank, and exception will be + thrown. + name: Optional name of the tensor for the error message. + + Returns: + A list of dimensions of the shape of tensor. All static dimensions will + be returned as python integers, and dynamic dimensions will be returned + as tf.Tensor scalars. + """ + if name is None: + name = tensor.name + + if expected_rank is not None: + assert_rank(tensor, expected_rank, name) + + shape = tensor.shape.as_list() + + non_static_indexes = [] + for (index, dim) in enumerate(shape): + if dim is None: + non_static_indexes.append(index) + + if not non_static_indexes: + return shape + + dyn_shape = tf.shape(tensor) + for index in non_static_indexes: + shape[index] = dyn_shape[index] + return shape + + +def reshape_to_matrix(input_tensor): + """Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).""" + ndims = input_tensor.shape.ndims + if ndims < 2: + raise ValueError( + 'Input tensor must have at least rank 2. Shape = %s' + % (input_tensor.shape) + ) + if ndims == 2: + return input_tensor + + width = input_tensor.shape[-1] + output_tensor = tf.reshape(input_tensor, [-1, width]) + return output_tensor + + +def reshape_from_matrix(output_tensor, orig_shape_list): + """Reshapes a rank 2 tensor back to its original rank >= 2 tensor.""" + if len(orig_shape_list) == 2: + return output_tensor + + output_shape = get_shape_list(output_tensor) + + orig_dims = orig_shape_list[0:-1] + width = output_shape[-1] + + return tf.reshape(output_tensor, orig_dims + [width]) + + +def assert_rank(tensor, expected_rank, name = None): + """Raises an exception if the tensor rank is not of the expected rank. + + Args: + tensor: A tf.Tensor to check the rank of. + expected_rank: Python integer or list of integers, expected rank. + name: Optional name of the tensor for the error message. + + Raises: + ValueError: If the expected shape doesn't match the actual shape. + """ + if name is None: + name = tensor.name + + expected_rank_dict = {} + if isinstance(expected_rank, six.integer_types): + expected_rank_dict[expected_rank] = True + else: + for x in expected_rank: + expected_rank_dict[x] = True + + actual_rank = tensor.shape.ndims + if actual_rank not in expected_rank_dict: + scope_name = tf.get_variable_scope().name + raise ValueError( + 'For the tensor `%s` in scope `%s`, the actual rank ' + '`%d` (shape = %s) is not equal to the expected rank `%s`' + % ( + name, + scope_name, + actual_rank, + str(tensor.shape), + str(expected_rank), + ) + ) diff --git a/neural-machine-translation/dnc.py b/neural-machine-translation/dnc.py deleted file mode 100644 index 8df92cf..0000000 --- a/neural-machine-translation/dnc.py +++ /dev/null @@ -1,142 +0,0 @@ -# Copyright 2017 Google Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""DNC Cores. - -These modules create a DNC core. They take input, pass parameters to the memory -access module, and integrate the output of memory to form an output. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import numpy as np -import sonnet as snt -import tensorflow as tf - -import access - -DNCState = collections.namedtuple('DNCState', ('access_output', 'access_state', - 'controller_state')) - - -class DNC(snt.RNNCore): - """DNC core module. - - Contains controller and memory access module. - """ - - def __init__(self, - access_config, - controller_config, - output_size, - clip_value=None, - name='dnc'): - """Initializes the DNC core. - - Args: - access_config: dictionary of access module configurations. - controller_config: dictionary of controller (LSTM) module configurations. - output_size: output dimension size of core. - clip_value: clips controller and core output values to between - `[-clip_value, clip_value]` if specified. - name: module name (default 'dnc'). - - Raises: - TypeError: if direct_input_size is not None for any access module other - than KeyValueMemory. - """ - super(DNC, self).__init__(name=name) - - with self._enter_variable_scope(): - self._controller = snt.LSTM(**controller_config) - self._access = access.MemoryAccess(**access_config) - - self._access_output_size = np.prod(self._access.output_size.as_list()) - self._output_size = output_size - self._clip_value = clip_value or 0 - - self._output_size = tf.TensorShape([output_size]) - self._state_size = DNCState( - access_output=self._access_output_size, - access_state=self._access.state_size, - controller_state=self._controller.state_size) - - def _clip_if_enabled(self, x): - if self._clip_value > 0: - return tf.clip_by_value(x, -self._clip_value, self._clip_value) - else: - return x - - def _build(self, inputs, prev_state): - """Connects the DNC core into the graph. - - Args: - inputs: Tensor input. - prev_state: A `DNCState` tuple containing the fields `access_output`, - `access_state` and `controller_state`. `access_state` is a 3-D Tensor - of shape `[batch_size, num_reads, word_size]` containing read words. - `access_state` is a tuple of the access module's state, and - `controller_state` is a tuple of controller module's state. - - Returns: - A tuple `(output, next_state)` where `output` is a tensor and `next_state` - is a `DNCState` tuple containing the fields `access_output`, - `access_state`, and `controller_state`. - """ - - prev_access_output = prev_state.access_output - prev_access_state = prev_state.access_state - prev_controller_state = prev_state.controller_state - - batch_flatten = snt.BatchFlatten() - controller_input = tf.concat( - [batch_flatten(inputs), batch_flatten(prev_access_output)], 1) - - controller_output, controller_state = self._controller( - controller_input, prev_controller_state) - - controller_output = self._clip_if_enabled(controller_output) - controller_state = snt.nest.map(self._clip_if_enabled, controller_state) - - access_output, access_state = self._access(controller_output, - prev_access_state) - - output = tf.concat([controller_output, batch_flatten(access_output)], 1) - output = snt.Linear( - output_size=self._output_size.as_list()[0], - name='output_linear')(output) - output = self._clip_if_enabled(output) - - return output, DNCState( - access_output=access_output, - access_state=access_state, - controller_state=controller_state) - - def initial_state(self, batch_size, dtype=tf.float32): - return DNCState( - controller_state=self._controller.initial_state(batch_size, dtype), - access_state=self._access.initial_state(batch_size, dtype), - access_output=tf.zeros( - [batch_size] + self._access.output_size.as_list(), dtype)) - - @property - def state_size(self): - return self._state_size - - @property - def output_size(self): - return self._output_size diff --git a/neural-machine-translation/electra/model/optimization.py b/neural-machine-translation/electra/model/optimization.py new file mode 100644 index 0000000..f035b2a --- /dev/null +++ b/neural-machine-translation/electra/model/optimization.py @@ -0,0 +1,228 @@ +# coding=utf-8 +# Copyright 2020 The Google Research Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Functions and classes related to optimization (weight updates). +Modified from the original BERT code to allow for having separate learning +rates for different layers of the network. +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import re +import tensorflow.compat.v1 as tf + + +def create_optimizer( + loss, + learning_rate, + num_train_steps, + weight_decay_rate = 0.0, + use_tpu = False, + warmup_steps = 0, + warmup_proportion = 0, + lr_decay_power = 1.0, + layerwise_lr_decay_power = -1, + n_transformer_layers = None, + decoder_layers = None, +): + """Creates an optimizer and training op.""" + global_step = tf.train.get_or_create_global_step() + learning_rate = tf.train.polynomial_decay( + learning_rate, + global_step, + num_train_steps, + end_learning_rate = 0.0, + power = lr_decay_power, + cycle = False, + ) + warmup_steps = max(num_train_steps * warmup_proportion, warmup_steps) + learning_rate *= tf.minimum( + 1.0, + tf.cast(global_step, tf.float32) / tf.cast(warmup_steps, tf.float32), + ) + cp_learning_rate = learning_rate + + if layerwise_lr_decay_power > 0: + learning_rate = _get_layer_lrs( + learning_rate, + layerwise_lr_decay_power, + n_transformer_layers, + decoder_layers, + ) + learning_rate['embedding_shared_weights/'] = cp_learning_rate + learning_rate['decoder_stack/layer_normalization/'] = cp_learning_rate + print(learning_rate) + optimizer = AdamWeightDecayOptimizer( + learning_rate = learning_rate, + weight_decay_rate = weight_decay_rate, + beta_1 = 0.9, + beta_2 = 0.999, + epsilon = 1e-6, + exclude_from_weight_decay = ['LayerNorm', 'layer_norm', 'bias'], + ) + if use_tpu: + optimizer = tf.tpu.CrossShardOptimizer(optimizer) + + tvars = tf.trainable_variables() + grads = tf.gradients(loss, tvars) + (grads, _) = tf.clip_by_global_norm(grads, clip_norm = 1.0) + train_op = optimizer.apply_gradients( + zip(grads, tvars), global_step = global_step + ) + new_global_step = global_step + 1 + train_op = tf.group(train_op, [global_step.assign(new_global_step)]) + return train_op + + +class AdamWeightDecayOptimizer(tf.train.Optimizer): + """A basic Adam optimizer that includes "correct" L2 weight decay.""" + + def __init__( + self, + learning_rate, + weight_decay_rate = 0.0, + beta_1 = 0.9, + beta_2 = 0.999, + epsilon = 1e-6, + exclude_from_weight_decay = None, + name = 'AdamWeightDecayOptimizer', + ): + """Constructs a AdamWeightDecayOptimizer.""" + super(AdamWeightDecayOptimizer, self).__init__(False, name) + + self.learning_rate = learning_rate + self.weight_decay_rate = weight_decay_rate + self.beta_1 = beta_1 + self.beta_2 = beta_2 + self.epsilon = epsilon + self.exclude_from_weight_decay = exclude_from_weight_decay + + def _apply_gradients(self, grads_and_vars, learning_rate): + """See base class.""" + assignments = [] + for (grad, param) in grads_and_vars: + if grad is None or param is None: + continue + + param_name = self._get_variable_name(param.name) + + m = tf.get_variable( + name = param_name + '/adam_m', + shape = param.shape.as_list(), + dtype = tf.float32, + trainable = False, + initializer = tf.zeros_initializer(), + ) + v = tf.get_variable( + name = param_name + '/adam_v', + shape = param.shape.as_list(), + dtype = tf.float32, + trainable = False, + initializer = tf.zeros_initializer(), + ) + + # Standard Adam update. + next_m = tf.multiply(self.beta_1, m) + tf.multiply( + 1.0 - self.beta_1, grad + ) + next_v = tf.multiply(self.beta_2, v) + tf.multiply( + 1.0 - self.beta_2, tf.square(grad) + ) + update = next_m / (tf.sqrt(next_v) + self.epsilon) + + # Just adding the square of the weights to the loss function is *not* + # the correct way of using L2 regularization/weight decay with Adam, + # since that will interact with the m and v parameters in strange ways. + # + # Instead we want ot decay the weights in a manner that doesn't interact + # with the m/v parameters. This is equivalent to adding the square + # of the weights to the loss with plain (non-momentum) SGD. + if self.weight_decay_rate > 0: + if self._do_use_weight_decay(param_name): + update += self.weight_decay_rate * param + + update_with_lr = learning_rate * update + next_param = param - update_with_lr + + assignments.extend( + [param.assign(next_param), m.assign(next_m), v.assign(next_v)] + ) + + return assignments + + def apply_gradients(self, grads_and_vars, global_step = None, name = None): + if isinstance(self.learning_rate, dict): + key_to_grads_and_vars = {} + for grad, var in grads_and_vars: + update_for_var = False + for key in self.learning_rate: + if key in var.name: + update_for_var = True + if key not in key_to_grads_and_vars: + key_to_grads_and_vars[key] = [] + key_to_grads_and_vars[key].append((grad, var)) + if not update_for_var: + raise ValueError( + 'No learning rate specified for variable', var + ) + assignments = [] + for key, key_grads_and_vars in key_to_grads_and_vars.items(): + assignments += self._apply_gradients( + key_grads_and_vars, self.learning_rate[key] + ) + else: + assignments = self._apply_gradients( + grads_and_vars, self.learning_rate + ) + return tf.group(*assignments, name = name) + + def _do_use_weight_decay(self, param_name): + """Whether to use L2 weight decay for `param_name`.""" + if not self.weight_decay_rate: + return False + if self.exclude_from_weight_decay: + for r in self.exclude_from_weight_decay: + if re.search(r, param_name) is not None: + return False + return True + + def _get_variable_name(self, param_name): + """Get the variable name from the tensor name.""" + m = re.match('^(.*):\\d+$', param_name) + if m is not None: + param_name = m.group(1) + return param_name + + +def _get_layer_lrs(learning_rate, layer_decay, n_layers, decoder_layers): + """Have lower learning rates for layers closer to the input.""" + key_to_depths = collections.OrderedDict( + { + '/embeddings/': 0, + '/embeddings_project/': 0, + 'task_specific/': n_layers + 2, + } + ) + for layer in range(n_layers): + key_to_depths['encoder/layer_' + str(layer) + '/'] = layer + 1 + for layer in range(decoder_layers): + key_to_depths['decoder_stack/layer_' + str(layer) + '/'] = layer + 1 + return { + key: learning_rate * (layer_decay ** (n_layers + 2 - depth)) + for key, depth in key_to_depths.items() + } diff --git a/neural-machine-translation/english-train b/neural-machine-translation/english-train deleted file mode 100644 index b6752f8..0000000 --- a/neural-machine-translation/english-train +++ /dev/null @@ -1,500 +0,0 @@ -Rachel Pike : The science behind a climate headline -In 4 minutes , atmospheric chemist Rachel Pike provides a glimpse of the massive scientific effort behind the bold headlines on climate change , with her team -- one of thousands who contributed -- taking a risky flight over the rainforest in pursuit of data on a key molecule . -I 'd like to talk to you today about the scale of the scientific effort that goes into making the headlines you see in the paper . -Headlines that look like this when they have to do with climate change , and headlines that look like this when they have to do with air quality or smog . -They are both two branches of the same field of atmospheric science . -Recently the headlines looked like this when the Intergovernmental Panel on Climate Change , or IPCC , put out their report on the state of understanding of the atmospheric system . -That report was written by 620 scientists from 40 countries . -They wrote almost a thousand pages on the topic . -And all of those pages were reviewed by another 400-plus scientists and reviewers , from 113 countries . -It 's a big community . It 's such a big community , in fact , that our annual gathering is the largest scientific meeting in the world . -Over 15,000 scientists go to San Francisco every year for that . -And every one of those scientists is in a research group , and every research group studies a wide variety of topics . -For us at Cambridge , it 's as varied as the El Niño oscillation , which affects weather and climate , to the assimilation of satellite data , to emissions from crops that produce biofuels , which is what I happen to study . -And in each one of these research areas , of which there are even more , there are PhD students , like me , and we study incredibly narrow topics , things as narrow as a few processes or a few molecules . -And one of the molecules I study is called isoprene , which is here . It 's a small organic molecule . You 've probably never heard of it . -The weight of a paper clip is approximately equal to 900 zeta-illion -- 10 to the 21st -- molecules of isoprene . -But despite its very small weight , enough of it is emitted into the atmosphere every year to equal the weight of all the people on the planet . -It 's a huge amount of stuff . It 's equal to the weight of methane . -And because it 's so much stuff , it 's really important for the atmospheric system . -Because it 's important to the atmospheric system , we go to all lengths to study this thing . -We blow it up and look at the pieces . -This is the EUPHORE Smog Chamber in Spain . -Atmospheric explosions , or full combustion , takes about 15,000 times longer than what happens in your car . -But still , we look at the pieces . -We run enormous models on supercomputers ; this is what I happen to do . -Our models have hundreds of thousands of grid boxes calculating hundreds of variables each , on minute timescales . -And it takes weeks to perform our integrations . -And we perform dozens of integrations in order to understand what 's happening . -We also fly all over the world looking for this thing . -I recently joined a field campaign in Malaysia . There are others . -We found a global atmospheric watchtower there , in the middle of the rainforest , and hung hundreds of thousands of dollars worth of scientific equipment off this tower , to look for isoprene , and of course , other things while we were there . -This is the tower in the middle of the rainforest , from above . -And this is the tower from below . -And on part of that field campaign we even brought an aircraft with us . -And this plane , the model , BA146 , which was run by FAAM , normally flies 120 to 130 people . -So maybe you took a similar aircraft to get here today . -But we didn 't just fly it . We were flying at 100 meters above the top of the canopy to measure this molecule -- incredibly dangerous stuff . -We have to fly at a special incline in order to make the measurements . -We hire military and test pilots to do the maneuvering . -We have to get special flight clearance . -And as you come around the banks in these valleys , the forces can get up to two Gs . -And the scientists have to be completely harnessed in in order to make measurements while they 're on board . -So , as you can imagine , the inside of this aircraft doesn 't look like any plane you would take on vacation . -It 's a flying laboratory that we took to make measurements in the region of this molecule . -We do all of this to understand the chemistry of one molecule . -And when one student like me has some sort of inclination or understanding about that molecule , they write one scientific paper on the subject . -And out of that field campaign we 'll probably get a few dozen papers on a few dozen processes or molecules . -And as a body of knowledge builds up , it will form one subsection , or one sub-subsection of an assessment like the IPCC , although we have others . -And each one of the 11 chapters of the IPCC has six to ten subsections . -So you can imagine the scale of the effort . -In each one of those assessments that we write , we always tag on a summary , and the summary is written for a non-scientific audience . -And we hand that summary to journalists and policy makers , in order to make headlines like these . -Thank you very much . -Christopher deCharms : A look inside the brain in real time -Neuroscientist and inventor Christopher deCharms demonstrates a new way to use fMRI to show brain activity -- thoughts , emotions , pain -- while it is happening . In other words , you can actually see how you feel . -Hi . I 'm going to ask you to raise your arms and wave back , just the way I am -- kind of a royal wave . -You can mimic what you can see . -You can program the hundreds of muscles in your arm . -Soon , you 'll be able to look inside your brain and program , control the hundreds of brain areas that you see there . -I 'm going to tell you about that technology . -People have wanted to look inside the human mind , the human brain , for thousands of years . -Well , coming out of the research labs just now , for our generation , is the possibility to do that . -People envision this as being very difficult . -You had to take a spaceship , shrink it down , inject it into the bloodstream . -It was terribly dangerous . -You could be attacked by white blood cells in the arteries . -But now , we have a real technology to do this . -We 're going to fly into my colleague Peter 's brain . -We 're going to do it non-invasively using MRI . -We don 't have to inject anything . We don 't need radiation . -We will be able to fly into the anatomy of Peter 's brain -- literally , fly into his body -- but more importantly , we can look into his mind . -When Peter moves his arm , that yellow spot you see there is the interface to the functioning of Peter 's mind taking place . -Now you 've seen before that with electrodes you can control robotic arms , that brain imaging and scanners can show you the insides of brains . -What 's new is that that process has typically taken days or months of analysis . -We 've collapsed that through technology to milliseconds , and that allows us to let Peter to look at his brain in real time as he 's inside the scanner . -He can look at these 65,000 points of activation per second . -If he can see this pattern in his own brain , he can learn how to control it . -There have been three ways to try to impact the brain : the therapist 's couch , pills and the knife . -This is a fourth alternative that you are soon going to have . -We all know that as we form thoughts , they form deep channels in our minds and in our brains . -Chronic pain is an example . If you burn yourself , you pull your hand away . -But if you 're still in pain in six months ' or six years ' time , it 's because these circuits are producing pain that 's no longer helping you . -If we can look at the activation in the brain that 's producing the pain , we can form 3D models and watch in real time the brain process information , and then we can select the areas that produce the pain . -So put your arms back up and flex your bicep . -Now imagine that you will soon be able to look inside your brain and select brain areas to do that same thing . -What you 're seeing here is , we 've selected the pathways in the brain of a chronic pain patient . -This may shock you , but we 're literally reading this person 's brain in real time . -They 're watching their own brain activation , and they 're controlling the pathway that produces their pain . -They 're learning to flex this system that releases their own endogenous opiates . -As they do it , in the upper left is a display that 's yoked to their brain activation of their own pain being controlled . -When they control their brain , they can control their pain . -This is an investigational technology , but , in clinical trials , we 're seeing a 44 to 64 percent decrease in chronic pain patients . -This is not " The Matrix . " You can only do this to yourself . You take control . -I 've seen inside my brain . You will too , soon . -When you do , what do you want to control ? -You will be able to look at all the aspects that make you yourself , all your experiences . -These are some of the areas we 're working on today that I don 't have time to go into in detail . -But I want to leave with you the big question . -We are the first generation that 's going to be able to enter into , using this technology , the human mind and brain . -Where will we take it ? -Beeban Kidron : The shared wonder of film -Movies have the power to create a shared narrative experience and to shape memories and worldviews . British film director Beeban Kidron invokes iconic film scenes -- from & lt ; em & gt ; Miracle in Milan & lt ; / em & gt ; to & lt ; em & gt ; Boyz n the Hood & lt ; / em & gt ; -- as she shows how her group FILMCLUB shares great films with kids . -Evidence suggests that humans in all ages and from all cultures create their identity in some kind of narrative form . -From mother to daughter , preacher to congregant , teacher to pupil , storyteller to audience . -Whether in cave paintings or the latest uses of the Internet , human beings have always told their histories and truths through parable and fable . -We are inveterate storytellers . -But where , in our increasingly secular and fragmented world , do we offer communality of experience , unmediated by our own furious consumerism ? -And what narrative , what history , what identity , what moral code are we imparting to our young ? -Cinema is arguably the 20th century 's most influential art form . -Its artists told stories across national boundaries , in as many languages , genres and philosophies as one can imagine . -Indeed , it is hard to find a subject that film has yet to tackle . -During the last decade we 've seen a vast integration of global media , now dominated by a culture of the Hollywood blockbuster . -We are increasingly offered a diet in which sensation , not story , is king . -What was common to us all 40 years ago -- the telling of stories between generations -- is now rarified . -As a filmmaker , it worried me . -As a human being , it puts the fear of God in me . -What future could the young build with so little grasp of where they 've come from and so few narratives of what 's possible ? -The irony is palpable ; technical access has never been greater , cultural access never weaker . -And so in 2006 we set up FILMCLUB , an organization that ran weekly film screenings in schools followed by discussions . -If we could raid the annals of 100 years of film , maybe we could build a narrative that would deliver meaning to the fragmented and restless world of the young . -Given the access to technology , even a school in a tiny rural hamlet could project a DVD onto a white board . -In the first nine months we ran 25 clubs across the U.K. , with kids in age groups between five and 18 watching a film uninterrupted for 90 minutes . -The films were curated and contextualized . -But the choice was theirs , and our audience quickly grew to choose the richest and most varied diet that we could provide . -The outcome , immediate . -It was an education of the most profound and transformative kind . -In groups as large as 150 and as small as three , these young people discovered new places , new thoughts , new perspectives . -By the time the pilot had finished , we had the names of a thousand schools that wished to join . -The film that changed my life is a 1951 film by Vittorio De Sica , " Miracle in Milan . " -It 's a remarkable comment on slums , poverty and aspiration . -I had seen the film on the occasion of my father 's 50th birthday . -Technology then meant we had to hire a viewing cinema , find and pay for the print and the projectionist . -But for my father , the emotional and artistic importance of De Sica 's vision was so great that he chose to celebrate his half-century with his three teenage children and 30 of their friends , " In order , " he said , " to pass the baton of concern and hope on to the next generation . " -In the last shot of " Miracle in Milan , " slum-dwellers float skyward on flying brooms . -Sixty years after the film was made and 30 years after I first saw it , I see young faces tilt up in awe , their incredulity matching mine . -And the speed with which they associate it with " Slumdog Millionaire " or the favelas in Rio speaks to the enduring nature . -In a FILMCLUB season about democracy and government , we screened " Mr. Smith Goes to Washington . " -Made in 1939 , the film is older than most of our members ' grandparents . -Frank Capra 's classic values independence and propriety . -It shows how to do right , how to be heroically awkward . -It is also an expression of faith in the political machine as a force of honor . -Shortly after " Mr. Smith " became a FILMCLUB classic , there was a week of all-night filibustering in the House of Lords . -And it was with great delight that we found young people up and down the country explaining with authority what filibustering was and why the Lords might defy their bedtime on a point of principle . -After all , Jimmy Stewart filibustered for two entire reels . -In choosing " Hotel Rwanda , " they explored genocide of the most brutal kind . -It provoked tears as well as incisive questions about unarmed peace-keeping forces and the double-dealing of a Western society that picks its moral fights with commodities in mind . -And when " Schindler 's List " demanded that they never forget , one child , full of the pain of consciousness , remarked , " We already forgot , otherwise how did ' Hotel Rwanda ' happen ? " -As they watch more films their lives got palpably richer . -" Pickpocket " started a debate about criminality disenfranchisement . -" To Sir , with Love " ignited its teen audience . -They celebrated a change in attitude towards non-white Britons , but railed against our restless school system that does not value collective identity , unlike that offered by Sidney Poitier 's careful tutelage . -By now , these thoughtful , opinionated , curious young people thought nothing of tackling films of all forms -- black and white , subtitled , documentary , non-narrative , fantasy -- and thought nothing of writing detailed reviews that competed to favor one film over another in passionate and increasingly sophisticated prose . -Six thousand reviews each school week vying for the honor of being review of the week . -From 25 clubs , we became hundreds , then thousands , until we were nearly a quarter of a million kids in 7,000 clubs right across the country . -And although the numbers were , and continue to be , extraordinary , what became more extraordinary was how the experience of critical and curious questioning translated into life . -Some of our kids started talking with their parents , others with their teachers , or with their friends . -And those without friends started making them . -The films provided communality across all manner of divide . -And the stories they held provided a shared experience . -" Persepolis " brought a daughter closer to her Iranian mother , and " Jaws " became the way in which one young boy was able to articulate the fear he 'd experienced in flight from violence that killed first his father then his mother , the latter thrown overboard on a boat journey . -Who was right , who wrong ? -What would they do under the same conditions ? -Was the tale told well ? -Was there a hidden message ? -How has the world changed ? How could it be different ? -A tsunami of questions flew out of the mouths of children who the world didn 't think were interested . -And they themselves had not known they cared . -And as they wrote and debated , rather than seeing the films as artifacts , they began to see themselves . -I have an aunt who is a wonderful storyteller . -In a moment she can invoke images of running barefoot on Table Mountain and playing cops and robbers . -Quite recently she told me that in 1948 , two of her sisters and my father traveled on a boat to Israel without my grandparents . -When the sailors mutinied at sea in a demand for humane conditions , it was these teenagers that fed the crew . -I was past 40 when my father died . -He never mentioned that journey . -My mother 's mother left Europe in a hurry without her husband , but with her three-year-old daughter and diamonds sewn into the hem of her skirt . -After two years in hiding , my grandfather appeared in London . -He was never right again . -And his story was hushed as he assimilated . -My story started in England with a clean slate and the silence of immigrant parents . -I had " Anne Frank , " " The Great Escape , " " Shoah , " " Triumph of the Will . " -It was Leni Riefenstahl in her elegant Nazi propaganda who gave context to what the family had to endure . -These films held what was too hurtful to say out loud , and they became more useful to me than the whispers of survivors and the occasional glimpse of a tattoo on a maiden aunt 's wrist . -Purists may feel that fiction dissipates the quest of real human understanding , that film is too crude to tell a complex and detailed history , or that filmmakers always serve drama over truth . -But within the reels lie purpose and meaning . -As one 12-year-old said after watching " Wizard of Oz , " " Every person should watch this , because unless you do you may not know that you too have a heart . " -We honor reading , why not honor watching with the same passion ? -Consider " Citizen Kane " as valuable as Jane Austen . -Agree that " Boyz n the Hood , " like Tennyson , offers an emotional landscape and a heightened understanding that work together . -Each a piece of memorable art , each a brick in the wall of who we are . -And it 's okay if we remember Tom Hanks better than astronaut Jim Lovell or have Ben Kingsley 's face superimposed onto that of Gandhi 's . -And though not real , Eve Harrington , Howard Beale , Mildred Pierce are an opportunity to discover what it is to be human , and no less helpful to understanding our life and times as Shakespeare is in illuminating the world of Elizabethan England . -We guessed that film , whose stories are a meeting place of drama , music , literature and human experience , would engage and inspire the young people participating in FILMCLUB . -What we could not have foreseen was the measurable improvements in behavior , confidence and academic achievement . -Once-reluctant students now race to school , talk to their teachers , fight , not on the playground , but to choose next week 's film -- young people who have found self-definition , ambition and an appetite for education and social engagement from the stories they have witnessed . -Our members defy the binary description of how we so often describe our young . -They are neither feral nor myopically self-absorbed . -They are , like other young people , negotiating a world with infinite choice , but little culture of how to find meaningful experience . -We appeared surprised at the behaviors of those who define themselves by the size of the tick on their shoes , yet acquisition has been the narrative we have offered . -If we want different values we have to tell a different story , a story that understands that an individual narrative is an essential component of a person 's identity , that a collective narrative is an essential component of a cultural identity , and without it it is impossible to imagine yourself as part of a group . -Because when these people get home after a screening of " Rear Window " and raise their gaze to the building next door , they have the tools to wonder who , apart from them , is out there and what is their story . -Thank you . -Ellen Jorgensen : Biohacking -- you can do it , too -We have personal computing , why not personal biotech ? That 's the question biologist Ellen Jorgensen and her colleagues asked themselves before opening Genspace , a nonprofit DIYbio lab in Brooklyn devoted to citizen science , where amateurs can go and tinker with biotechnology . Far from being a sinister Frankenstein 's lab , Genspace offers a long list of fun , creative and practical uses for DIYbio . -It 's a great time to be a molecular biologist . -Reading and writing DNA code is getting easier and cheaper . -By the end of this year , we 'll be able to sequence the three million bits of information in your genome in less than a day and for less than 1,000 euros . -Biotech is probably the most powerful and the fastest-growing technology sector . -It has the power , potentially , to replace our fossil fuels , to revolutionize medicine , and to touch every aspect of our daily lives . -So who gets to do it ? -I think we 'd all be pretty comfortable with this guy doing it . -But what about that guy ? -In 2009 , I first heard about DIYbio . -It 's a movement that -- it advocates making biotechnology accessible to everyone , not just scientists and people in government labs . -The idea is that if you open up the science and you allow diverse groups to participate , it could really stimulate innovation . -Putting technology in the hands of the end user is usually a good idea because they 've got the best idea of what their needs are . -And here 's this really sophisticated technology coming down the road , all these associated social , moral , ethical questions , and we scientists are just lousy at explaining to the public just exactly what it is we 're doing in those labs . -So wouldn 't it be nice if there was a place in your local neighborhood where you could go and learn about this stuff , do it hands-on ? -I thought so . -So , three years ago , I got together with some friends of mine who had similar aspirations and we founded Genspace . -It 's a nonprofit , a community biotech lab in Brooklyn , New York , and the idea was people could come , they could take classes and putter around in the lab in a very open , friendly atmosphere . -None of my previous experience prepared me for what came next . Can you guess ? -The press started calling us . -And the more we talked about how great it was to increase science literacy , the more they wanted to talk about us creating the next Frankenstein , and as a result , for the next six months , when you Googled my name , instead of getting my scientific papers , you got this . -[ " Am I a biohazard ? " ] It was pretty depressing . -The only thing that got us through that period was that we knew that all over the world , there were other people that were trying to do the same thing that we were . -They were opening biohacker spaces , and some of them were facing much greater challenges than we did , more regulations , less resources . -But now , three years later , here 's where we stand . -It 's a vibrant , global community of hackerspaces , and this is just the beginning . -These are some of the biggest ones , and there are others opening every day . -There 's one probably going to open up in Moscow , one in South Korea , and the cool thing is they each have their own individual flavor that grew out of the community they came out of . -Let me take you on a little tour . -Biohackers work alone . -We work in groups , in big cities — — and in small villages . -We reverse engineer lab equipment . -We genetically engineer bacteria . -We hack hardware , software , wetware , and , of course , the code of life . -We like to build things . -Then we like to take things apart . -We make things grow . -We make things glow . -And we make cells dance . -The spirit of these labs , it 's open , it 's positive , but , you know , sometimes when people think of us , the first thing that comes to mind is bio-safety , bio-security , all the dark side stuff . -I 'm not going to minimize those concerns . -Any powerful technology is inherently dual use , and , you know , you get something like synthetic biology , nanobiotechnology , it really compels you , you have to look at both the amateur groups but also the professional groups , because they have better infrastructure , they have better facilities , and they have access to pathogens . -So the United Nations did just that , and they recently issued a report on this whole area , and what they concluded was the power of this technology for positive was much greater than the risk for negative , and they even looked specifically at the DIYbio community , and they noted , not surprisingly , that the press had a tendency to consistently overestimate our capabilities and underestimate our ethics . -As a matter of fact , DIY people from all over the world , America , Europe , got together last year , and we hammered out a common code of ethics . -That 's a lot more than conventional science has done . -Now , we follow state and local regulations . -We dispose of our waste properly , we follow safety procedures , we don 't work with pathogens . -You know , if you 're working with a pathogen , you 're not part of the biohacker community , you 're part of the bioterrorist community , I 'm sorry . -And sometimes people ask me , " Well , what about an accident ? " -Well , working with the safe organisms that we normally work with , the chance of an accident happening with somebody accidentally creating , like , some sort of superbug , that 's literally about as probable as a snowstorm in the middle of the Sahara Desert . -Now , it could happen , but I 'm not going to plan my life around it . -I 've actually chosen to take a different kind of risk . -I signed up for something called the Personal Genome Project . -It 's a study at Harvard where , at the end of the study , they 're going to take my entire genomic sequence , all of my medical information , and my identity , and they 're going to post it online for everyone to see . -There were a lot of risks involved that they talked about during the informed consent portion . -The one I liked the best is , someone could download my sequence , go back to the lab , synthesize some fake Ellen DNA , and plant it at a crime scene . -But like DIYbio , the positive outcomes and the potential for good for a study like that far outweighs the risk . -Now , you might be asking yourself , " Well , you know , what would I do in a biolab ? " -Well , it wasn 't that long ago we were asking , " Well , what would anyone do with a personal computer ? " -So this stuff is just beginning . -We 're only seeing just the tip of the DNA iceberg . -Let me show you what you could do right now . -A biohacker in Germany , a journalist , wanted to know whose dog was leaving little presents on his street ? -Yep , you guessed it . He threw tennis balls to all the neighborhood dogs , analyzed the saliva , identified the dog , and confronted the dog owner . -I discovered an invasive species in my own backyard . -Looked like a ladybug , right ? -It actually is a Japanese beetle . -And the same kind of technology -- it 's called DNA barcoding , it 's really cool -- You can use it to check if your caviar is really beluga , if that sushi is really tuna , or if that goat cheese that you paid so much for is really goat 's . -In a biohacker space , you can analyze your genome for mutations . -You can analyze your breakfast cereal for GMO 's , and you can explore your ancestry . -You can send weather balloons up into the stratosphere , collect microbes , see what 's up there . -You can make a biocensor out of yeast to detect pollutants in water . -You can make some sort of a biofuel cell . -You can do a lot of things . -You can also do an art science project . Some of these are really spectacular , and they look at social , ecological problems from a completely different perspective . -It 's really cool . -Some people ask me , well , why am I involved ? -I could have a perfectly good career in mainstream science . -The thing is , there 's something in these labs that they have to offer society that you can 't find anywhere else . -There 's something sacred about a space where you can work on a project , and you don 't have to justify to anyone that it 's going to make a lot of money , that it 's going to save mankind , or even that it 's feasible . -It just has to follow safety guidelines . -If you had spaces like this all over the world , it could really change the perception of who 's allowed to do biotech . -It 's spaces like these that spawned personal computing . -Why not personal biotech ? -If everyone in this room got involved , who knows what we could do ? -This is such a new area , and as we say back in Brooklyn , you ain 't seen nothin ' yet . -Geert Chatrou : A whistleblower you haven 't heard -In this engaging talk , world champion whistler Geert Chatrou performs the whimsical " Eleonora " by A. Honhoff , and his own " Fête de la Belle . " In a fascinating interlude , he talks about what brought him to the craft . & lt ; em & gt ; & lt ; / em & gt ; -Thank you very much . -That was whistling . -I 'm trying to do this in English . -What is a chubby , curly-haired guy from Holland -- why is he whistling ? -Well actually , I 've [ been ] whistling since the age of four , about four . -My dad was always whistling around the house , and I just thought that 's part of communication in my family . -So I whistled along with him . -And actually , till I was 34 , I always annoyed and irritated people with whistling , because , to be honest , my whistling is a kind of deviant behavior . -I whistled alone . I whistled in the classroom . -I whistled on [ my ] bike . I whistled everywhere . -And I also whistled at a Christmas Eve party with my family-in-law . -And they had some , in my opinion , terrible Christmas music . -And when I hear music that I don 't like , I try to make it better . -So " Rudolph the Red-Nosed Reindeer " -- you know it ? -But it can also sound like this . -But during a Christmas party -- at dinner actually -- it 's very annoying . -So my sister-in-law asked me a few times , " Please stop whistling . " -And I just couldn 't . -And at one point -- and I had some wine , I have to admit that -- at one point I said , " If there was a contest , I would join . " -And two weeks later I received a text message : " You 're going to America . " -So , okay , I 'm going to America . -I would love to , but why ? -So I immediately called her up , of course . -She Googled , and she found this World Whistling Championship in America , of course . -She didn 't expect me to go there . -And I would have lost my face . -I don 't know if that 's correct English . -But the Dutch people here will understand what I mean . -I lost my face . -And she thought , " He will never go there . " -But actually I did . -So I went to Louisburg , North Carolina , southeast United States , and I entered the world of whistling . -And I also entered the world championship , and I won there in 2004 . -That was great fun , of course . -And to defend my title -- like judokas do and sportsmen -- I thought , well let 's go back in 2005 , and I won again . -Then I couldn 't participate for a few years . -And in 2008 I entered again in Japan , Tokyo , and I won again . -So what happened now is I 'm standing here in Rotterdam , in the beautiful city , on a big stage , and I 'm talking about whistling . -And actually I earn my money whistling at the moment . -So I quit my day job as a nurse . -And I try to live my dream -- well , actually , it was never my dream , but it sounds so good . -Okay , I 'm not the only one whistling here . -You say , " Huh , what do you mean ? " -Well actually , you are going to whistle along . -And then always the same thing happens : people are watching each other and think , " Oh , my God . -Why ? Can I go away ? " -No , you can 't . -Actually it 's very simple . -The track that I will whistle is called " Fête de la Belle . " -It 's about 80 minutes long . -No , no , no . It 's four minutes long . -And I want to first rehearse with you your whistling . -So I whistle the tone . -Sorry . I forgot one thing . -You whistle the same tone as me . -I heard a wide variety of tones . -This is very promising . -This is very promising . -I 'll ask the technicians to start the music . -And if it 's started , I just point where you whistle along , and we will see what happens . -Oh , hah . -I 'm so sorry , technicians . -I 'm so used to that . -I start it myself . -Okay , here it is . -Okay . -It 's easy , isn 't it ? -Now comes the solo . I propose I do that myself . -Max Westerman : Geert Chatrou , the World Champion [ of ] Whistling . -Geert Chatrou : Thank you . Thank you . -Roberto D 'Angelo + Francesca Fedeli : In our baby 's illness , a life lesson -Roberto D 'Angelo and Francesca Fedeli thought their baby boy Mario was healthy -- until at 10 days old , they discovered he 'd had a perinatal stroke . With Mario unable to control the left side of his body , they grappled with tough questions : Would he be " normal ? " Could he live a full life ? The poignant story of parents facing their fears -- and how they turned them around . -Francesca Fedeli : Ciao . -So he 's Mario . He 's our son . -He was born two and a half years ago , and I had a pretty tough pregnancy because I had to stay still in a bed for , like , eight months . -But in the end everything seemed to be under control . -So he got the right weight at birth . -He got the right Apgar index . -So we were pretty reassured by this . -But at the end , 10 days later after he was born , we discovered that he had a stroke . -As you might know , a stroke is a brain injury . -A perinatal stroke could be something that can happen during the nine months of pregnancy or just suddenly after the birth , and in his case , as you can see , the right part of his brain has gone . -So the effect that this stroke could have on Mario 's body could be the fact that he couldn 't be able to control the left side of his body . -Just imagine , if you have a computer and a printer and you want to transmit , to input to print out a document , but the printer doesn 't have the right drives , so the same is for Mario . -It 's just like , he would like to move his left side of his body , but he 's not able to transmit the right input to move his left arm and left leg . -So life had to change . -We needed to change our schedule . -We needed to change the impact that this birth had on our life . -As you may imagine , unfortunately , we were not ready . -Nobody taught us how to deal with such kinds of disabilities , and as many questions as possible started to come to our minds . -And that has been really a tough time . -Questions , some basics , like , you know , why did this happen to us ? -And what went wrong ? -Some more tough , like , really , what will be the impact on Mario 's life ? -I mean , at the end , will he be able to work ? -Will he be able to be normal ? -And , you know , as a parent , especially for the first time , why is he not going to be better than us ? -And this , indeed , really is tough to say , but a few months later , we realized that we were really feeling like a failure . -I mean , the only real product of our life , at the end , was a failure . -And you know , it was not a failure for ourselves in itself , but it was a failure that will impact his full life . -Honestly , we went down . -I mean we went really down , but at the end , we started to look at him , and we said , we have to react . -So immediately , as Francesca said , we changed our life . -We started physiotherapy , we started the rehabilitation , and one of the paths that we were following in terms of rehabilitation is the mirror neurons pilot . -Basically , we spent months doing this with Mario . -You have an object , and we showed him how to grab the object . -Now , the theory of mirror neurons simply says that in your brains , exactly now , as you watch me doing this , you are activating exactly the same neurons as if you do the actions . -It looks like this is the leading edge in terms of rehabilitation . -But one day we found that Mario was not looking at our hand . -He was looking at us . -We were his mirror . -And the problem , as you might feel , is that we were down , we were depressed , we were looking at him as a problem , not as a son , not from a positive perspective . -And that day really changed our perspective . -We realized that we had to become a better mirror for Mario . -We restarted from our strengths , and at the same time we restarted from his strengths . -We stopped looking at him as a problem , and we started to look at him as an opportunity to improve . -And really , this was the change , and from our side , we said , " What are our strengths that we really can bring to Mario ? " -And we started from our passions . -I mean , at the end , my wife and myself are quite different , but we have many things in common . -We love to travel , we love music , we love to be in places like this , and we started to bring Mario with us just to show to him the best things that we can show to him . -This short video is from last week . -I am not saying -- — I am not saying it 's a miracle . That 's not the message , because we are just at the beginning of the path . -But we want to share what was the key learning , the key learning that Mario drove to us , and it is to consider what you have as a gift and not only what you miss , and to consider what you miss just as an opportunity . -And this is the message that we want to share with you . -This is why we are here . -Mario ! -And this is why -- — And this is why we decided to share the best mirror in the world with him . -And we thank you so much , all of you . -Thank you . Thank you . Bye . -Thank you . -Mark Shaw : One very dry demo -Mark Shaw demos Ultra-Ever Dry , a liquid-repellent coating that acts as an astonishingly powerful shield against water and water-based materials . At the nano level , the spray covers a surface with an umbrella of air so that water bounces right off . Watch for an exciting two-minute kicker . -I 'm here to show you how something you can 't see can be so much fun to look at . -You 're about to experience a new , available and exciting technology that 's going to make us rethink how we waterproof our lives . -What I have here is a cinder block that we 've coated half with a nanotechnology spray that can be applied to almost any material . -It 's called Ultra-Ever Dry , and when you apply it to any material , it turns into a superhydrophobic shield . -So this is a cinder block , uncoated , and you can see that it 's porous , it absorbs water . -Not anymore . -Porous , nonporous . -So what 's superhydrophobic ? -Superhydrophobic is how we measure a drop of water on a surface . -The rounder it is , the more hydrophobic it is , and if it 's really round , it 's superhydrophobic . -A freshly waxed car , the water molecules slump to about 90 degrees . -A windshield coating is going to give you about 110 degrees . -But what you 're seeing here is 160 to 175 degrees , and anything over 150 is superhydrophobic . -So as part of the demonstration , what I have is a pair of gloves , and we 've coated one of the gloves with the nanotechnology coating , and let 's see if you can tell which one , and I 'll give you a hint . -Did you guess the one that was dry ? -When you have nanotechnology and nanoscience , what 's occurred is that we 're able to now look at atoms and molecules and actually control them for great benefits . -And we 're talking really small here . -The way you measure nanotechnology is in nanometers , and one nanometer is a billionth of a meter , and to put some scale to that , if you had a nanoparticle that was one nanometer thick , and you put it side by side , and you had 50,000 of them , you 'd be the width of a human hair . -So very small , but very useful . -And it 's not just water that this works with . -It 's a lot of water-based materials like concrete , water-based paint , mud , and also some refined oils as well . -You can see the difference . -Moving onto the next demonstration , we 've taken a pane of glass and we 've coated the outside of it , we 've framed it with the nanotechnology coating , and we 're going to pour this green-tinted water inside the middle , and you 're going to see , it 's going to spread out on glass like you 'd normally think it would , except when it hits the coating , it stops , and I can 't even coax it to leave . -It 's that afraid of the water . -So what 's going on here ? What 's happening ? -Well , the surface of the spray coating is actually filled with nanoparticles that form a very rough and craggly surface . -You 'd think it 'd be smooth , but it 's actually not . -And it has billions of interstitial spaces , and those spaces , along with the nanoparticles , reach up and grab the air molecules , and cover the surface with air . -It 's an umbrella of air all across it , and that layer of air is what the water hits , the mud hits , the concrete hits , and it glides right off . -So if I put this inside this water here , you can see a silver reflective coating around it , and that silver reflective coating is the layer of air that 's protecting the water from touching the paddle , and it 's dry . -So what are the applications ? -I mean , many of you right now are probably going through your head . -Everyone that sees this gets excited , and says , " Oh , I could use it for this and this and this . " -The applications in a general sense could be anything that 's anti-wetting . -We 've certainly seen that today . -It could be anything that 's anti-icing , because if you don 't have water , you don 't have ice . -It could be anti-corrosion . -No water , no corrosion . -It could be anti-bacterial . -Without water , the bacteria won 't survive . -And it could be things that need to be self-cleaning as well . -So imagine how something like this could help revolutionize your field of work . -And I 'm going to leave you with one last demonstration , but before I do that , I would like to say thank you , and think small . -It 's going to happen . Wait for it . Wait for it . -You guys didn 't hear about us cutting out the Design from TED ? -[ Two minutes later ... ] He ran into all sorts of problems in terms of managing the medical research part . -It 's happening ! -Dan Ariely : Our buggy moral code -Behavioral economist Dan Ariely studies the bugs in our moral code : the hidden reasons we think it 's OK to cheat or steal . Clever studies help make his point that we 're predictably irrational -- and can be influenced in ways we can 't grasp . -I want to talk to you today a little bit about predictable irrationality . -And my interest in irrational behavior started many years ago in the hospital . -I was burned very badly . -And if you spend a lot of time in hospital , you 'll see a lot of types of irrationalities . -And the one that particularly bothered me in the burn department was the process by which the nurses took the bandage off me . -Now , you must have all taken a Band-Aid off at some point , and you must have wondered what 's the right approach . -Do you rip it off quickly -- short duration but high intensity -- or do you take your Band-Aid off slowly -- you take a long time , but each second is not as painful -- which one of those is the right approach ? -The nurses in my department thought that the right approach was the ripping one , so they would grab hold and they would rip , and they would grab hold and they would rip . -And because I had 70 percent of my body burned , it would take about an hour . -And as you can imagine , I hated that moment of ripping with incredible intensity . -And I would try to reason with them and say , " Why don 't we try something else ? -Why don 't we take it a little longer -- maybe two hours instead of an hour -- and have less of this intensity ? " -And the nurses told me two things . -They told me that they had the right model of the patient -- that they knew what was the right thing to do to minimize my pain -- and they also told me that the word patient doesn 't mean to make suggestions or to interfere or ... -This is not just in Hebrew , by the way . -It 's in every language I 've had experience with so far . -And , you know , there 's not much -- there wasn 't much I could do , and they kept on doing what they were doing . -And about three years later , when I left the hospital , I started studying at the university . -And one of the most interesting lessons I learned was that there is an experimental method that if you have a question you can create a replica of this question in some abstract way , and you can try to examine this question , maybe learn something about the world . -So that 's what I did . -I was still interested in this question of how do you take bandages off burn patients . -So originally I didn 't have much money , so I went to a hardware store and I bought a carpenter 's vice . -And I would bring people to the lab and I would put their finger in it , and I would crunch it a little bit . -And I would crunch it for long periods and short periods , and pain that went up and pain that went down , and with breaks and without breaks -- all kinds of versions of pain . -And when I finished hurting people a little bit , I would ask them , so , how painful was this ? Or , how painful was this ? -Or , if you had to choose between the last two , which one would you choose ? -I kept on doing this for a while . -And then , like all good academic projects , I got more funding . -I moved to sounds , electrical shocks -- I even had a pain suit that I could get people to feel much more pain . diff --git a/neural-machine-translation/gpt_2.py b/neural-machine-translation/gpt_2.py deleted file mode 100644 index 147ae6e..0000000 --- a/neural-machine-translation/gpt_2.py +++ /dev/null @@ -1,211 +0,0 @@ -import numpy as np -import tensorflow as tf - - -def shape_list(x): - """Deal with dynamic shape in tensorflow cleanly.""" - static = x.shape.as_list() - dynamic = tf.shape(x) - return [dynamic[i] if s is None else s for i, s in enumerate(static)] - - -def softmax(x, axis = -1): - x = x - tf.reduce_max(x, axis = axis, keepdims = True) - ex = tf.exp(x) - return ex / tf.reduce_sum(ex, axis = axis, keepdims = True) - - -def gelu(x): - return ( - 0.5 - * x - * (1 + tf.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))) - ) - - -def norm(x, scope, *, axis = -1, epsilon = 1e-5): - """Normalize to mean = 0, std = 1, then do a diagonal affine transform.""" - with tf.variable_scope(scope): - n_state = x.shape[-1].value - g = tf.get_variable( - 'g', [n_state], initializer = tf.constant_initializer(1) - ) - b = tf.get_variable( - 'b', [n_state], initializer = tf.constant_initializer(0) - ) - u = tf.reduce_mean(x, axis = axis, keepdims = True) - s = tf.reduce_mean(tf.square(x - u), axis = axis, keepdims = True) - x = (x - u) * tf.rsqrt(s + epsilon) - x = x * g + b - return x - - -def split_states(x, n): - """Reshape the last dimension of x into [n, x.shape[-1]/n].""" - *start, m = shape_list(x) - return tf.reshape(x, start + [n, m // n]) - - -def merge_states(x): - """Smash the last two dimensions of x into a single dimension.""" - *start, a, b = shape_list(x) - return tf.reshape(x, start + [a * b]) - - -def conv1d(x, scope, nf, *, w_init_stdev = 0.02): - with tf.variable_scope(scope): - *start, nx = shape_list(x) - w = tf.get_variable( - 'w', - [1, nx, nf], - initializer = tf.random_normal_initializer(stddev = w_init_stdev), - ) - b = tf.get_variable('b', [nf], initializer = tf.constant_initializer(0)) - c = tf.reshape( - tf.matmul(tf.reshape(x, [-1, nx]), tf.reshape(w, [-1, nf])) + b, - start + [nf], - ) - return c - - -def attention_mask(nd, ns, *, dtype): - """1's in the lower triangle, counting from the lower right corner. - - Same as tf.matrix_band_part(tf.ones([nd, ns]), -1, ns-nd), but doesn't produce garbage on TPUs. - """ - i = tf.range(nd)[:, None] - j = tf.range(ns) - m = i >= j - ns + nd - return tf.cast(m, dtype) - - -def attn(x, scope, n_state, *, past, hparams): - assert x.shape.ndims == 3 # Should be [batch, sequence, features] - assert n_state % hparams.n_head == 0 - if past is not None: - assert ( - past.shape.ndims == 5 - ) # Should be [batch, 2, heads, sequence, features], where 2 is [k, v] - - def split_heads(x): - # From [batch, sequence, features] to [batch, heads, sequence, features] - return tf.transpose(split_states(x, hparams.n_head), [0, 2, 1, 3]) - - def merge_heads(x): - # Reverse of split_heads - return merge_states(tf.transpose(x, [0, 2, 1, 3])) - - def mask_attn_weights(w): - # w has shape [batch, heads, dst_sequence, src_sequence], where information flows from src to dst. - _, _, nd, ns = shape_list(w) - b = attention_mask(nd, ns, dtype = w.dtype) - b = tf.reshape(b, [1, 1, nd, ns]) - w = w * b - tf.cast(1e10, w.dtype) * (1 - b) - return w - - def multihead_attn(q, k, v): - # q, k, v have shape [batch, heads, sequence, features] - w = tf.matmul(q, k, transpose_b = True) - w = w * tf.rsqrt(tf.cast(v.shape[-1].value, w.dtype)) - - w = mask_attn_weights(w) - w = softmax(w) - a = tf.matmul(w, v) - return a - - with tf.variable_scope(scope): - c = conv1d(x, 'c_attn', n_state * 3) - q, k, v = map(split_heads, tf.split(c, 3, axis = 2)) - present = tf.stack([k, v], axis = 1) - if past is not None: - pk, pv = tf.unstack(past, axis = 1) - k = tf.concat([pk, k], axis = -2) - v = tf.concat([pv, v], axis = -2) - a = multihead_attn(q, k, v) - a = merge_heads(a) - a = conv1d(a, 'c_proj', n_state) - return a, present - - -def mlp(x, scope, n_state, *, hparams): - with tf.variable_scope(scope): - nx = x.shape[-1].value - h = gelu(conv1d(x, 'c_fc', n_state)) - h2 = conv1d(h, 'c_proj', nx) - return h2 - - -def block(x, scope, *, past, hparams): - with tf.variable_scope(scope): - nx = x.shape[-1].value - a, present = attn( - norm(x, 'ln_1'), 'attn', nx, past = past, hparams = hparams - ) - x = x + a - m = mlp(norm(x, 'ln_2'), 'mlp', nx * 4, hparams = hparams) - x = x + m - return x, present - - -def past_shape(*, hparams, batch_size = None, sequence = None): - return [ - batch_size, - hparams.n_layer, - 2, - hparams.n_head, - sequence, - hparams.n_embd // hparams.n_head, - ] - - -def expand_tile(value, size): - """Add a new axis of given size.""" - value = tf.convert_to_tensor(value, name = 'value') - ndims = value.shape.ndims - return tf.tile(tf.expand_dims(value, axis = 0), [size] + [1] * ndims) - - -def positions_for(tokens, past_length): - batch_size = tf.shape(tokens)[0] - nsteps = tf.shape(tokens)[1] - return expand_tile(past_length + tf.range(nsteps), batch_size) - - -def model(hparams, X, past = None, scope = 'model', reuse = False): - with tf.variable_scope(scope, reuse = reuse): - results = {} - batch, sequence = shape_list(X) - - wpe = tf.get_variable( - 'wpe', - [hparams.n_ctx, hparams.n_embd], - initializer = tf.random_normal_initializer(stddev = 0.01), - ) - wte = tf.get_variable( - 'wte', - [hparams.n_vocab, hparams.n_embd], - initializer = tf.random_normal_initializer(stddev = 0.02), - ) - past_length = 0 if past is None else tf.shape(past)[-2] - h = tf.gather(wte, X) + tf.gather(wpe, positions_for(X, past_length)) - - # Transformer - presents = [] - pasts = ( - tf.unstack(past, axis = 1) - if past is not None - else [None] * hparams.n_layer - ) - assert len(pasts) == hparams.n_layer - for layer, past in enumerate(pasts): - h, present = block(h, 'h%d' % layer, past = past, hparams = hparams) - presents.append(present) - results['present'] = tf.stack(presents, axis = 1) - h = norm(h, 'ln_f') - - # Language model loss. Do tokens 100 or len(train_fr[i].split()) > 100:\n", + " continue\n", + " train_X.append(train_en[i])\n", + " train_Y.append(train_fr[i])\n", + " \n", + "len(train_X), len(train_Y)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X, test_Y = train_X[-5000:], train_Y[-5000:]\n", + "train_X, train_Y = train_X[:200000], train_Y[:200000]" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json', 'w') as fopen:\n", + " json.dump({'train_X': train_X, 'train_Y': train_Y, 'test_X': test_X, 'test_Y': test_Y}, fopen)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/prepare-t2t.ipynb b/neural-machine-translation/prepare-t2t.ipynb new file mode 100644 index 0000000..7d071e4 --- /dev/null +++ b/neural-machine-translation/prepare-t2t.ipynb @@ -0,0 +1,94 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "!mkdir train\n", + "!mkdir test" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open('dataset.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "train_X = data['train_X']\n", + "train_Y = data['train_Y']\n", + "test_X = data['test_X']\n", + "test_Y = data['test_Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('train/before.txt', 'w') as fopen:\n", + " fopen.write('\\n'.join(train_X))\n", + " \n", + "with open('train/after.txt', 'w') as fopen:\n", + " fopen.write('\\n'.join(train_Y))\n", + " \n", + "with open('test/before.txt', 'w') as fopen:\n", + " fopen.write('\\n'.join(test_X))\n", + " \n", + "with open('test/after.txt', 'w') as fopen:\n", + " fopen.write('\\n'.join(test_Y))" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "train/\n", + "train/after.txt\n", + "train/before.txt\n", + "test/\n", + "test/after.txt\n", + "test/before.txt\n" + ] + } + ], + "source": [ + "!tar -czvf train-translation.tar.gz train\n", + "!tar -czvf test-translation.tar.gz test" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/neural-machine-translation/t/text_encoder.py b/neural-machine-translation/t/text_encoder.py new file mode 100644 index 0000000..560c41e --- /dev/null +++ b/neural-machine-translation/t/text_encoder.py @@ -0,0 +1,1146 @@ +# coding=utf-8 +# Copyright 2020 The Tensor2Tensor Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Encoders for text data. + +* TextEncoder: base class +* ByteTextEncoder: for ascii text +* TokenTextEncoder: with user-supplied vocabulary file +* SubwordTextEncoder: invertible +""" +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +from itertools import chain +import math +import re +import tempfile +import time +import numpy as np +import six +from six.moves import range # pylint: disable=redefined-builtin +from . import tokenizer + +import tensorflow.compat.v1 as tf + +# Reserved tokens for things like padding and EOS symbols. +PAD = '' +EOS = '' +RESERVED_TOKENS = [PAD, EOS] +NUM_RESERVED_TOKENS = len(RESERVED_TOKENS) +PAD_ID = RESERVED_TOKENS.index(PAD) # Normally 0 +EOS_ID = RESERVED_TOKENS.index(EOS) # Normally 1 + +if six.PY2: + RESERVED_TOKENS_BYTES = RESERVED_TOKENS +else: + RESERVED_TOKENS_BYTES = [bytes(PAD, 'ascii'), bytes(EOS, 'ascii')] + +# Regular expression for unescaping token strings. +# '\u' is converted to '_' +# '\\' is converted to '\' +# '\213;' is converted to unichr(213) +_UNESCAPE_REGEX = re.compile(r'\\u|\\\\|\\([0-9]+);') +_ESCAPE_CHARS = set(u'\\_u;0123456789') + + +# Unicode utility functions that work with Python 2 and 3 +def native_to_unicode(s): + if is_unicode(s): + return s + try: + return to_unicode(s) + except UnicodeDecodeError: + res = to_unicode(s, ignore_errors = True) + tf.logging.info('Ignoring Unicode error, outputting: %s' % res) + return res + + +def unicode_to_native(s): + if six.PY2: + return s.encode('utf-8') if is_unicode(s) else s + else: + return s + + +def is_unicode(s): + return isinstance(s, six.text_type) + + +def to_unicode(s, ignore_errors = False): + if is_unicode(s): + return s + error_mode = 'ignore' if ignore_errors else 'strict' + return s.decode('utf-8', errors = error_mode) + + +def to_unicode_ignore_errors(s): + return to_unicode(s, ignore_errors = True) + + +def to_unicode_utf8(s): + return unicode(s, 'utf-8') if six.PY2 else s.decode('utf-8') + + +def strip_ids(ids, ids_to_strip): + """Strip ids_to_strip from the end ids.""" + ids = list(ids) + while ids and ids[-1] in ids_to_strip: + ids.pop() + return ids + + +class TextEncoder(object): + """Base class for converting from ints to/from human readable strings.""" + + def __init__(self, num_reserved_ids = NUM_RESERVED_TOKENS): + self._num_reserved_ids = num_reserved_ids + + @property + def num_reserved_ids(self): + return self._num_reserved_ids + + def encode(self, s): + """Transform a human-readable string into a sequence of int ids. + + The ids should be in the range [num_reserved_ids, vocab_size). Ids [0, + num_reserved_ids) are reserved. + + EOS is not appended. + + Args: + s: human-readable string to be converted. + + Returns: + ids: list of integers + """ + return [int(w) + self._num_reserved_ids for w in s.split()] + + def decode(self, ids, strip_extraneous = False): + """Transform a sequence of int ids into a human-readable string. + + EOS is not expected in ids. + + Args: + ids: list of integers to be converted. + strip_extraneous: bool, whether to strip off extraneous tokens + (EOS and PAD). + + Returns: + s: human-readable string. + """ + if strip_extraneous: + ids = strip_ids(ids, list(range(self._num_reserved_ids or 0))) + return ' '.join(self.decode_list(ids)) + + def decode_list(self, ids): + """Transform a sequence of int ids into a their string versions. + + This method supports transforming individual input/output ids to their + string versions so that sequence to/from text conversions can be visualized + in a human readable format. + + Args: + ids: list of integers to be converted. + + Returns: + strs: list of human-readable string. + """ + decoded_ids = [] + for id_ in ids: + if 0 <= id_ < self._num_reserved_ids: + decoded_ids.append(RESERVED_TOKENS[int(id_)]) + else: + decoded_ids.append(id_ - self._num_reserved_ids) + return [str(d) for d in decoded_ids] + + @property + def vocab_size(self): + raise NotImplementedError() + + +class ByteTextEncoder(TextEncoder): + """Encodes each byte to an id. For 8-bit strings only.""" + + def encode(self, s): + numres = self._num_reserved_ids + if six.PY2: + if isinstance(s, unicode): + s = s.encode('utf-8') + return [ord(c) + numres for c in s] + # Python3: explicitly convert to UTF-8 + return [c + numres for c in s.encode('utf-8')] + + def decode(self, ids, strip_extraneous = False): + if strip_extraneous: + ids = strip_ids(ids, list(range(self._num_reserved_ids or 0))) + numres = self._num_reserved_ids + decoded_ids = [] + int2byte = six.int2byte + for id_ in ids: + if 0 <= id_ < numres: + decoded_ids.append(RESERVED_TOKENS_BYTES[int(id_)]) + else: + decoded_ids.append(int2byte(id_ - numres)) + if six.PY2: + return ''.join(decoded_ids) + # Python3: join byte arrays and then decode string + return b''.join(decoded_ids).decode('utf-8', 'replace') + + def decode_list(self, ids): + numres = self._num_reserved_ids + decoded_ids = [] + int2byte = six.int2byte + for id_ in ids: + if 0 <= id_ < numres: + decoded_ids.append(RESERVED_TOKENS_BYTES[int(id_)]) + else: + decoded_ids.append(int2byte(id_ - numres)) + # Python3: join byte arrays and then decode string + return decoded_ids + + @property + def vocab_size(self): + return 2 ** 8 + self._num_reserved_ids + + +class ClassLabelEncoder(TextEncoder): + """Encoder for class labels.""" + + def __init__(self, class_labels = None, class_labels_fname = None): + super(ClassLabelEncoder, self).__init__(num_reserved_ids = 0) + + if class_labels_fname: + with tf.gfile.Open(class_labels_fname) as f: + class_labels = [label.strip() for label in f.readlines()] + + assert class_labels + self._class_labels = class_labels + + def encode(self, s): + label_str = s + return self._class_labels.index(label_str) + + def decode(self, ids, strip_extraneous = False): + del strip_extraneous + label_id = ids + if isinstance(label_id, list): + assert len(label_id) == 1 + label_id, = label_id + if isinstance(label_id, np.ndarray): + label_id = np.squeeze(label_id) + return self._class_labels[label_id] + + def decode_list(self, ids): + return [self._class_labels[i] for i in ids] + + @property + def vocab_size(self): + return len(self._class_labels) + + +class OneHotClassLabelEncoder(ClassLabelEncoder): + """One-hot encoder for class labels.""" + + def encode( + self, label_str, on_value = 1, off_value = 0 + ): # pylint: disable=arguments-differ + e = np.full(self.vocab_size, off_value, dtype = np.int32) + e[self._class_labels.index(label_str)] = on_value + return e.tolist() + + def decode(self, ids, strip_extraneous = False): + del strip_extraneous + label_id = ids + if isinstance(label_id, np.ndarray): + label_id = np.squeeze(label_id).astype(np.int8).tolist() + assert isinstance(label_id, list) + assert len(label_id) == self.vocab_size + return self._class_labels[label_id.index(1)] + + @property + def vocab_size(self): + return len(self._class_labels) + + +class TokenTextEncoder(TextEncoder): + """Encoder based on a user-supplied vocabulary (file or list).""" + + def __init__( + self, + vocab_filename, + reverse = False, + vocab_list = None, + replace_oov = None, + num_reserved_ids = NUM_RESERVED_TOKENS, + ): + """Initialize from a file or list, one token per line. + + Handling of reserved tokens works as follows: + - When initializing from a list, we add reserved tokens to the vocab. + - When initializing from a file, we do not add reserved tokens to the vocab. + - When saving vocab files, we save reserved tokens to the file. + + Args: + vocab_filename: If not None, the full filename to read vocab from. If this + is not None, then vocab_list should be None. + reverse: Boolean indicating if tokens should be reversed during encoding + and decoding. + vocab_list: If not None, a list of elements of the vocabulary. If this is + not None, then vocab_filename should be None. + replace_oov: If not None, every out-of-vocabulary token seen when + encoding will be replaced by this string (which must be in vocab). + num_reserved_ids: Number of IDs to save for reserved tokens like . + """ + super(TokenTextEncoder, self).__init__( + num_reserved_ids = num_reserved_ids + ) + self._reverse = reverse + self._replace_oov = replace_oov + if vocab_filename: + self._init_vocab_from_file(vocab_filename) + else: + assert vocab_list is not None + self._init_vocab_from_list(vocab_list) + + def encode(self, s): + """Converts a space-separated string of tokens to a list of ids.""" + sentence = s + tokens = sentence.strip().split() + if self._replace_oov is not None: + tokens = [ + t if t in self._token_to_id else self._replace_oov + for t in tokens + ] + ret = [self._token_to_id[tok] for tok in tokens] + return ret[::-1] if self._reverse else ret + + def decode(self, ids, strip_extraneous = False): + return ' '.join(self.decode_list(ids)) + + def decode_list(self, ids): + seq = reversed(ids) if self._reverse else ids + return [self._safe_id_to_token(i) for i in seq] + + @property + def vocab_size(self): + return len(self._id_to_token) + + def _safe_id_to_token(self, idx): + return self._id_to_token.get(idx, 'ID_%d' % idx) + + def _init_vocab_from_file(self, filename): + """Load vocab from a file. + + Args: + filename: The file to load vocabulary from. + """ + with tf.gfile.Open(filename) as f: + tokens = [token.strip() for token in f.readlines()] + + def token_gen(): + for token in tokens: + yield token + + self._init_vocab(token_gen(), add_reserved_tokens = False) + + def _init_vocab_from_list(self, vocab_list): + """Initialize tokens from a list of tokens. + + It is ok if reserved tokens appear in the vocab list. They will be + removed. The set of tokens in vocab_list should be unique. + + Args: + vocab_list: A list of tokens. + """ + + def token_gen(): + for token in vocab_list: + if token not in RESERVED_TOKENS: + yield token + + self._init_vocab(token_gen()) + + def _init_vocab(self, token_generator, add_reserved_tokens = True): + """Initialize vocabulary with tokens from token_generator.""" + + self._id_to_token = {} + non_reserved_start_index = 0 + + if add_reserved_tokens: + self._id_to_token.update(enumerate(RESERVED_TOKENS)) + non_reserved_start_index = len(RESERVED_TOKENS) + + self._id_to_token.update( + enumerate(token_generator, start = non_reserved_start_index) + ) + + # _token_to_id is the reverse of _id_to_token + self._token_to_id = dict( + (v, k) for k, v in six.iteritems(self._id_to_token) + ) + + def store_to_file(self, filename): + """Write vocab file to disk. + + Vocab files have one token per line. The file ends in a newline. Reserved + tokens are written to the vocab file as well. + + Args: + filename: Full path of the file to store the vocab to. + """ + with tf.gfile.Open(filename, 'w') as f: + for i in range(len(self._id_to_token)): + f.write(self._id_to_token[i] + '\n') + + +def _escape_token(token, alphabet): + """Escape away underscores and OOV characters and append '_'. + + This allows the token to be expressed as the concatenation of a list + of subtokens from the vocabulary. The underscore acts as a sentinel + which allows us to invertibly concatenate multiple such lists. + + Args: + token: A unicode string to be escaped. + alphabet: A set of all characters in the vocabulary's alphabet. + + Returns: + escaped_token: An escaped unicode string. + + Raises: + ValueError: If the provided token is not unicode. + """ + if not isinstance(token, six.text_type): + raise ValueError('Expected string type for token, got %s' % type(token)) + + token = token.replace(u'\\', u'\\\\').replace(u'_', u'\\u') + ret = [ + c if c in alphabet and c != u'\n' else r'\%d;' % ord(c) for c in token + ] + return u''.join(ret) + '_' + + +def _unescape_token(escaped_token): + """Inverse of _escape_token(). + + Args: + escaped_token: a unicode string + + Returns: + token: a unicode string + """ + + def match(m): + if m.group(1) is None: + return u'_' if m.group(0) == u'\\u' else u'\\' + + try: + return six.unichr(int(m.group(1))) + except (ValueError, OverflowError) as _: + return u'\u3013' # Unicode for undefined character. + + trimmed = ( + escaped_token[:-1] if escaped_token.endswith('_') else escaped_token + ) + return _UNESCAPE_REGEX.sub(match, trimmed) + + +class SubwordTextEncoder(TextEncoder): + """Class for invertibly encoding text using a limited vocabulary. + + Invertibly encodes a native string as a sequence of subtokens from a limited + vocabulary. + + A SubwordTextEncoder is built from a corpus (so it is tailored to the text in + the corpus), and stored to a file. See text_encoder_build_subword.py. + + It can then be loaded and used to encode/decode any text. + + Encoding has four phases: + + 1. Tokenize into a list of tokens. Each token is a unicode string of either + all alphanumeric characters or all non-alphanumeric characters. We drop + tokens consisting of a single space that are between two alphanumeric + tokens. + + 2. Escape each token. This escapes away special and out-of-vocabulary + characters, and makes sure that each token ends with an underscore, and + has no other underscores. + + 3. Represent each escaped token as a the concatenation of a list of subtokens + from the limited vocabulary. Subtoken selection is done greedily from + beginning to end. That is, we construct the list in order, always picking + the longest subtoken in our vocabulary that matches a prefix of the + remaining portion of the encoded token. + + 4. Concatenate these lists. This concatenation is invertible due to the + fact that the trailing underscores indicate when one list is finished. + + """ + + def __init__(self, filename = None): + """Initialize and read from a file, if provided. + + Args: + filename: filename from which to read vocab. If None, do not load a + vocab + """ + self._alphabet = set() + self.filename = filename + if filename is not None: + self._load_from_file(filename) + super(SubwordTextEncoder, self).__init__() + + def encode(self, s): + """Converts a native string to a list of subtoken ids. + + Args: + s: a native string. + Returns: + a list of integers in the range [0, vocab_size) + """ + return self._tokens_to_subtoken_ids( + tokenizer.encode(native_to_unicode(s)) + ) + + def encode_without_tokenizing(self, token_text): + """Converts string to list of subtoken ids without calling tokenizer. + + This treats `token_text` as a single token and directly converts it + to subtoken ids. This may be useful when the default tokenizer doesn't + do what we want (e.g., when encoding text with tokens composed of lots of + nonalphanumeric characters). It is then up to the caller to make sure that + raw text is consistently converted into tokens. Only use this if you are + sure that `encode` doesn't suit your needs. + + Args: + token_text: A native string representation of a single token. + Returns: + A list of subword token ids; i.e., integers in the range [0, vocab_size). + """ + return self._tokens_to_subtoken_ids([native_to_unicode(token_text)]) + + def decode(self, ids, strip_extraneous = False): + """Converts a sequence of subtoken ids to a native string. + + Args: + ids: a list of integers in the range [0, vocab_size) + strip_extraneous: bool, whether to strip off extraneous tokens + (EOS and PAD). + + Returns: + a native string + """ + if strip_extraneous: + ids = strip_ids(ids, list(range(self._num_reserved_ids or 0))) + return unicode_to_native( + tokenizer.decode(self._subtoken_ids_to_tokens(ids)) + ) + + def decode_list(self, ids): + return [self._subtoken_id_to_subtoken_string(s) for s in ids] + + @property + def vocab_size(self): + """The subtoken vocabulary size.""" + return len(self._all_subtoken_strings) + + def _tokens_to_subtoken_ids(self, tokens): + """Converts a list of tokens to a list of subtoken ids. + + Args: + tokens: a list of strings. + Returns: + a list of integers in the range [0, vocab_size) + """ + ret = [] + for token in tokens: + ret.extend(self._token_to_subtoken_ids(token)) + return ret + + def _token_to_subtoken_ids(self, token): + """Converts token to a list of subtoken ids. + + Args: + token: a string. + Returns: + a list of integers in the range [0, vocab_size) + """ + cache_location = hash(token) % self._cache_size + cache_key, cache_value = self._cache[cache_location] + if cache_key == token: + return cache_value + ret = self._escaped_token_to_subtoken_ids( + _escape_token(token, self._alphabet) + ) + self._cache[cache_location] = (token, ret) + return ret + + def _subtoken_ids_to_tokens(self, subtokens): + """Converts a list of subtoken ids to a list of tokens. + + Args: + subtokens: a list of integers in the range [0, vocab_size) + Returns: + a list of strings. + """ + concatenated = ''.join( + [self._subtoken_id_to_subtoken_string(s) for s in subtokens] + ) + split = concatenated.split('_') + ret = [] + for t in split: + if t: + unescaped = _unescape_token(t + '_') + if unescaped: + ret.append(unescaped) + return ret + + def _subtoken_id_to_subtoken_string(self, subtoken): + """Converts a subtoken integer ID to a subtoken string.""" + if 0 <= subtoken < self.vocab_size: + return self._all_subtoken_strings[subtoken] + return u'' + + def _escaped_token_to_subtoken_strings(self, escaped_token): + """Converts an escaped token string to a list of subtoken strings. + + Args: + escaped_token: An escaped token as a unicode string. + Returns: + A list of subtokens as unicode strings. + """ + # NOTE: This algorithm is greedy; it won't necessarily produce the "best" + # list of subtokens. + ret = [] + start = 0 + token_len = len(escaped_token) + while start < token_len: + for end in range( + min(token_len, start + self._max_subtoken_len), start, -1 + ): + subtoken = escaped_token[start:end] + if subtoken in self._subtoken_string_to_id: + ret.append(subtoken) + start = end + break + + else: # Did not break + # If there is no possible encoding of the escaped token then one of the + # characters in the token is not in the alphabet. This should be + # impossible and would be indicative of a bug. + assert ( + False + ), 'Token substring not found in subtoken vocabulary.' + + return ret + + def _escaped_token_to_subtoken_ids(self, escaped_token): + """Converts an escaped token string to a list of subtoken IDs. + + Args: + escaped_token: An escaped token as a unicode string. + Returns: + A list of subtoken IDs as integers. + """ + return [ + self._subtoken_string_to_id[subtoken] + for subtoken in self._escaped_token_to_subtoken_strings( + escaped_token + ) + ] + + @classmethod + def build_from_generator( + cls, + generator, + target_size, + max_subtoken_length = None, + reserved_tokens = None, + ): + """Builds a SubwordTextEncoder from the generated text. + + Args: + generator: yields text. + target_size: int, approximate vocabulary size to create. + max_subtoken_length: Maximum length of a subtoken. If this is not set, + then the runtime and memory use of creating the vocab is quadratic in + the length of the longest token. If this is set, then it is instead + O(max_subtoken_length * length of longest token). + reserved_tokens: List of reserved tokens. The global variable + `RESERVED_TOKENS` must be a prefix of `reserved_tokens`. If this + argument is `None`, it will use `RESERVED_TOKENS`. + + Returns: + SubwordTextEncoder with `vocab_size` approximately `target_size`. + """ + token_counts = collections.defaultdict(int) + for item in generator: + for tok in tokenizer.encode(native_to_unicode(item)): + token_counts[tok] += 1 + encoder = cls.build_to_target_size( + target_size, + token_counts, + 1, + 1e3, + max_subtoken_length = max_subtoken_length, + reserved_tokens = reserved_tokens, + ) + return encoder + + @classmethod + def build_to_target_size( + cls, + target_size, + token_counts, + min_val, + max_val, + max_subtoken_length = None, + reserved_tokens = None, + num_iterations = 4, + ): + """Builds a SubwordTextEncoder that has `vocab_size` near `target_size`. + + Uses simple recursive binary search to find a minimum token count that most + closely matches the `target_size`. + + Args: + target_size: Desired vocab_size to approximate. + token_counts: A dictionary of token counts, mapping string to int. + min_val: An integer; lower bound for the minimum token count. + max_val: An integer; upper bound for the minimum token count. + max_subtoken_length: Maximum length of a subtoken. If this is not set, + then the runtime and memory use of creating the vocab is quadratic in + the length of the longest token. If this is set, then it is instead + O(max_subtoken_length * length of longest token). + reserved_tokens: List of reserved tokens. The global variable + `RESERVED_TOKENS` must be a prefix of `reserved_tokens`. If this + argument is `None`, it will use `RESERVED_TOKENS`. + num_iterations: An integer; how many iterations of refinement. + + Returns: + A SubwordTextEncoder instance. + + Raises: + ValueError: If `min_val` is greater than `max_val`. + """ + if min_val > max_val: + raise ValueError( + 'Lower bound for the minimum token count ' + 'is greater than the upper bound.' + ) + if target_size < 1: + raise ValueError('Target size must be positive.') + + if reserved_tokens is None: + reserved_tokens = RESERVED_TOKENS + + def bisect(min_val, max_val): + """Bisection to find the right size.""" + present_count = (max_val + min_val) // 2 + tf.logging.info('Trying min_count %d' % present_count) + subtokenizer = cls() + subtokenizer.build_from_token_counts( + token_counts, + present_count, + num_iterations, + max_subtoken_length = max_subtoken_length, + reserved_tokens = reserved_tokens, + ) + + # Being within 1% of the target size is ok. + is_ok = ( + abs(subtokenizer.vocab_size - target_size) * 100 < target_size + ) + # If min_val == max_val, we can't do any better than this. + if is_ok or min_val >= max_val or present_count < 2: + return subtokenizer + + if subtokenizer.vocab_size > target_size: + other_subtokenizer = bisect(present_count + 1, max_val) + else: + other_subtokenizer = bisect(min_val, present_count - 1) + + if other_subtokenizer is None: + return subtokenizer + + if abs(other_subtokenizer.vocab_size - target_size) < abs( + subtokenizer.vocab_size - target_size + ): + return other_subtokenizer + return subtokenizer + + return bisect(min_val, max_val) + + def build_from_token_counts( + self, + token_counts, + min_count, + num_iterations = 4, + reserved_tokens = None, + max_subtoken_length = None, + ): + """Train a SubwordTextEncoder based on a dictionary of word counts. + + Args: + token_counts: a dictionary of Unicode strings to int. + min_count: an integer - discard subtokens with lower counts. + num_iterations: an integer. how many iterations of refinement. + reserved_tokens: List of reserved tokens. The global variable + `RESERVED_TOKENS` must be a prefix of `reserved_tokens`. If this + argument is `None`, it will use `RESERVED_TOKENS`. + max_subtoken_length: Maximum length of a subtoken. If this is not set, + then the runtime and memory use of creating the vocab is quadratic in + the length of the longest token. If this is set, then it is instead + O(max_subtoken_length * length of longest token). + + Raises: + ValueError: if reserved is not 0 or len(RESERVED_TOKENS). In this case, it + is not clear what the space is being reserved for, or when it will be + filled in. + """ + if reserved_tokens is None: + reserved_tokens = RESERVED_TOKENS + else: + # There is not complete freedom in replacing RESERVED_TOKENS. + for default, proposed in zip(RESERVED_TOKENS, reserved_tokens): + if default != proposed: + raise ValueError( + 'RESERVED_TOKENS must be a prefix of ' + 'reserved_tokens.' + ) + + # Initialize the alphabet. Note, this must include reserved tokens or it can + # result in encoding failures. + alphabet_tokens = chain( + six.iterkeys(token_counts), + [native_to_unicode(t) for t in reserved_tokens], + ) + + self._init_alphabet_from_tokens(alphabet_tokens) + + # Bootstrap the initial list of subtokens with the characters from the + # alphabet plus the escaping characters. + self._init_subtokens_from_list( + list(self._alphabet), reserved_tokens = reserved_tokens + ) + + # We build iteratively. On each iteration, we segment all the words, + # then count the resulting potential subtokens, keeping the ones + # with high enough counts for our new vocabulary. + if min_count < 1: + min_count = 1 + for i in range(num_iterations): + tf.logging.info('Iteration {0}'.format(i)) + + # Collect all substrings of the encoded token that break along current + # subtoken boundaries. + subtoken_counts = collections.defaultdict(int) + for token, count in six.iteritems(token_counts): + iter_start_time = time.time() + escaped_token = _escape_token(token, self._alphabet) + subtokens = self._escaped_token_to_subtoken_strings( + escaped_token + ) + start = 0 + for subtoken in subtokens: + last_position = len(escaped_token) + 1 + if max_subtoken_length is not None: + last_position = min( + last_position, start + max_subtoken_length + ) + + for end in range(start + 1, last_position): + new_subtoken = escaped_token[start:end] + subtoken_counts[new_subtoken] += count + start += len(subtoken) + iter_time_secs = time.time() - iter_start_time + if iter_time_secs > 0.1: + tf.logging.info( + u'Processing token [{0}] took {1} seconds, consider ' + 'setting Text2TextProblem.max_subtoken_length to a ' + 'smaller value.'.format(token, iter_time_secs) + ) + + # Array of sets of candidate subtoken strings, by length. + len_to_subtoken_strings = [] + for subtoken_string, count in six.iteritems(subtoken_counts): + lsub = len(subtoken_string) + if count >= min_count: + while len(len_to_subtoken_strings) <= lsub: + len_to_subtoken_strings.append(set()) + len_to_subtoken_strings[lsub].add(subtoken_string) + + # Consider the candidates longest to shortest, so that if we accept + # a longer subtoken string, we can decrement the counts of its prefixes. + new_subtoken_strings = [] + for lsub in range(len(len_to_subtoken_strings) - 1, 0, -1): + subtoken_strings = len_to_subtoken_strings[lsub] + for subtoken_string in subtoken_strings: + count = subtoken_counts[subtoken_string] + if count >= min_count: + # Exclude alphabet tokens here, as they must be included later, + # explicitly, regardless of count. + if subtoken_string not in self._alphabet: + new_subtoken_strings.append( + (count, subtoken_string) + ) + for l in range(1, lsub): + subtoken_counts[subtoken_string[:l]] -= count + + # Include the alphabet explicitly to guarantee all strings are encodable. + new_subtoken_strings.extend( + (subtoken_counts.get(a, 0), a) for a in self._alphabet + ) + new_subtoken_strings.sort(reverse = True) + + # Reinitialize to the candidate vocabulary. + new_subtoken_strings = [ + subtoken for _, subtoken in new_subtoken_strings + ] + if reserved_tokens: + escaped_reserved_tokens = [ + _escape_token(native_to_unicode(t), self._alphabet) + for t in reserved_tokens + ] + new_subtoken_strings = ( + escaped_reserved_tokens + new_subtoken_strings + ) + + self._init_subtokens_from_list(new_subtoken_strings) + tf.logging.info('vocab_size = %d' % self.vocab_size) + + @property + def all_subtoken_strings(self): + return tuple(self._all_subtoken_strings) + + def dump(self): + """Debugging dump of the current subtoken vocabulary.""" + subtoken_strings = [ + (i, s) for s, i in six.iteritems(self._subtoken_string_to_id) + ] + print( + u', '.join( + u"{0} : '{1}'".format(i, s) for i, s in sorted(subtoken_strings) + ) + ) + + def _init_subtokens_from_list( + self, subtoken_strings, reserved_tokens = None + ): + """Initialize token information from a list of subtoken strings. + + Args: + subtoken_strings: a list of subtokens + reserved_tokens: List of reserved tokens. We must have `reserved_tokens` + as None or the empty list, or else the global variable `RESERVED_TOKENS` + must be a prefix of `reserved_tokens`. + + Raises: + ValueError: if reserved is not 0 or len(RESERVED_TOKENS). In this case, it + is not clear what the space is being reserved for, or when it will be + filled in. + """ + if reserved_tokens is None: + reserved_tokens = [] + + if reserved_tokens: + self._all_subtoken_strings = reserved_tokens + subtoken_strings + else: + self._all_subtoken_strings = subtoken_strings + + # we remember the maximum length of any subtoken to avoid having to + # check arbitrarily long strings. + self._max_subtoken_len = max([len(s) for s in subtoken_strings]) + self._subtoken_string_to_id = { + s: i + len(reserved_tokens) + for i, s in enumerate(subtoken_strings) + if s + } + # Initialize the cache to empty. + self._cache_size = 2 ** 20 + self._cache = [(None, None)] * self._cache_size + + def _init_alphabet_from_tokens(self, tokens): + """Initialize alphabet from an iterable of token or subtoken strings.""" + # Include all characters from all tokens in the alphabet to guarantee that + # any token can be encoded. Additionally, include all escaping characters. + self._alphabet = {c for token in tokens for c in token} + self._alphabet |= _ESCAPE_CHARS + + def _load_from_file_object(self, f): + """Load from a file object. + + Args: + f: File object to load vocabulary from + """ + subtoken_strings = [] + for line in f: + s = line.rstrip() + # Some vocab files wrap words in single quotes, but others don't + if (s.startswith("'") and s.endswith("'")) or ( + s.startswith('"') and s.endswith('"') + ): + s = s[1:-1] + subtoken_strings.append(native_to_unicode(s)) + self._init_subtokens_from_list(subtoken_strings) + self._init_alphabet_from_tokens(subtoken_strings) + + def _load_from_file(self, filename): + """Load from a vocab file.""" + if not tf.gfile.Exists(filename): + raise ValueError('File %s not found' % filename) + with tf.gfile.Open(filename) as f: + self._load_from_file_object(f) + + def store_to_file(self, filename, add_single_quotes = True): + with tf.gfile.Open(filename, 'w') as f: + for subtoken_string in self._all_subtoken_strings: + if add_single_quotes: + f.write("'" + unicode_to_native(subtoken_string) + "'\n") + else: + f.write(unicode_to_native(subtoken_string) + '\n') + + +class ImageEncoder(object): + """Encoder class for saving and loading images.""" + + def __init__( + self, num_reserved_ids = 0, height = None, width = None, channels = 3 + ): + assert num_reserved_ids == 0 + self._height = height + self._width = width + self._channels = channels + + @property + def num_reserved_ids(self): + return 0 + + def encode(self, s): + """Transform a string with a filename into a list of RGB integers. + + Args: + s: path to the file with an image. + + Returns: + ids: list of integers + """ + try: + import matplotlib.image as im # pylint: disable=g-import-not-at-top + except ImportError as e: + tf.logging.warning( + 'Reading an image requires matplotlib to be installed: %s', e + ) + raise NotImplementedError('Image reading not implemented.') + return im.imread(s) + + def decode(self, ids, strip_extraneous = False): + """Transform a sequence of int ids into an image file. + + Args: + ids: list of integers to be converted. + strip_extraneous: unused + + Returns: + Path to the temporary file where the image was saved. + + Raises: + ValueError: if the ids are not of the appropriate size. + """ + del strip_extraneous + _, tmp_file_path = tempfile.mkstemp('_decode.png') + if self._height is None or self._width is None: + size = int(math.sqrt(len(ids) / self._channels)) + length = size * size * self._channels + else: + size = None + length = self._height * self._width * self._channels + if len(ids) != length: + raise ValueError( + 'Length of ids (%d) must be height (%d) x width (%d) x ' + 'channels (%d); %d != %d.\n Ids: %s' + % ( + len(ids), + self._height, + self._width, + self._channels, + len(ids), + length, + ' '.join([str(i) for i in ids]), + ) + ) + with tf.Graph().as_default(): + raw = tf.constant(ids, dtype = tf.uint8) + if size is None: + img = tf.reshape( + raw, [self._height, self._width, self._channels] + ) + else: + img = tf.reshape(raw, [size, size, self._channels]) + png = tf.image.encode_png(img) + op = tf.write_file(tmp_file_path, png) + with tf.Session() as sess: + sess.run(op) + return tmp_file_path + + def decode_list(self, ids): + """Transform a sequence of int ids into an image file. + + Args: + ids: list of integers to be converted. + + Returns: + Singleton list: path to the temporary file where the image was saved. + """ + return [self.decode(ids)] + + @property + def vocab_size(self): + return 256 + + +class RealEncoder(object): + """Encoder class for saving and loading float values.""" + + def encode(self, s): + """Transform a string (space separated float values) into a float array. + + Args: + s: space separated float values. + + Returns: + Array of float values. + """ + return [float(w) for w in s.split()] + + def decode(self, ids, strip_extraneous = False): + """Transform sequence of float values into string (float values). + + Args: + ids: array of floats to be converted. + strip_extraneous: unused + + Returns: + String having space separated float values. + + Raises: + ValueError: if the ids are not of the appropriate size. + """ + del strip_extraneous + return ' '.join([str(i) for i in ids]) diff --git a/neural-machine-translation/t/tokenizer.py b/neural-machine-translation/t/tokenizer.py new file mode 100644 index 0000000..c5199e2 --- /dev/null +++ b/neural-machine-translation/t/tokenizer.py @@ -0,0 +1,175 @@ +# coding=utf-8 +# Copyright 2020 The Tensor2Tensor Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A simple invertible tokenizer. + +Converts from a unicode string to a list of tokens +(represented as Unicode strings). + +This tokenizer has the following desirable properties: + - It is invertible. + - Alphanumeric characters are broken away from non-alphanumeric characters. + - A single space between words does not produce an extra token. + - The full Unicode punctuation and separator set is recognized. + +The tokenization algorithm is as follows: + +1. Split the text into a list of tokens, splitting at every boundary of an + alphanumeric character and a non-alphanumeric character. This produces + a list which alternates between "alphanumeric tokens" + (strings of alphanumeric characters) and "non-alphanumeric tokens" + (strings of non-alphanumeric characters). + +2. Remove every token consisting of a single space, unless it is + the very first or very last token in the list. These tokens are now + implied by the fact that there are two adjacent alphanumeric tokens. + +e.g. u"Dude - that's so cool." + -> [u"Dude", u" - ", u"that", u"'", u"s", u"so", u"cool", u"."] +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import sys +import unicodedata +import six +from six.moves import range # pylint: disable=redefined-builtin +import tensorflow.compat.v1 as tf + +# Conversion between Unicode and UTF-8, if required (on Python2) +_native_to_unicode = (lambda s: s.decode('utf-8')) if six.PY2 else (lambda s: s) + + +# This set contains all letter and number characters. +_ALPHANUMERIC_CHAR_SET = set( + six.unichr(i) + for i in range(sys.maxunicode) + if ( + unicodedata.category(six.unichr(i)).startswith('L') + or unicodedata.category(six.unichr(i)).startswith('N') + ) +) + + +def encode(text): + """Encode a unicode string as a list of tokens. + + Args: + text: a unicode string + Returns: + a list of tokens as Unicode strings + """ + if not text: + return [] + ret = [] + token_start = 0 + # Classify each character in the input string + is_alnum = [c in _ALPHANUMERIC_CHAR_SET for c in text] + for pos in range(1, len(text)): + if is_alnum[pos] != is_alnum[pos - 1]: + token = text[token_start:pos] + if token != u' ' or token_start == 0: + ret.append(token) + token_start = pos + final_token = text[token_start:] + ret.append(final_token) + return ret + + +def decode(tokens): + """Decode a list of tokens to a unicode string. + + Args: + tokens: a list of Unicode strings + Returns: + a unicode string + """ + token_is_alnum = [t[0] in _ALPHANUMERIC_CHAR_SET for t in tokens] + ret = [] + for i, token in enumerate(tokens): + if i > 0 and token_is_alnum[i - 1] and token_is_alnum[i]: + ret.append(u' ') + ret.append(token) + return ''.join(ret) + + +def _read_filepattern(filepattern, max_lines = None, split_on_newlines = True): + """Reads files matching a wildcard pattern, yielding the contents. + + Args: + filepattern: A wildcard pattern matching one or more files. + max_lines: If set, stop reading after reading this many lines. + split_on_newlines: A boolean. If true, then split files by lines and strip + leading and trailing whitespace from each line. Otherwise, treat each + file as a single string. + + Yields: + The contents of the files as lines, if split_on_newlines is True, or + the entire contents of each file if False. + """ + filenames = sorted(tf.gfile.Glob(filepattern)) + lines_read = 0 + for filename in filenames: + with tf.gfile.Open(filename) as f: + if split_on_newlines: + for line in f: + yield line.strip() + lines_read += 1 + if max_lines and lines_read >= max_lines: + return + + else: + if max_lines: + doc = [] + for line in f: + doc.append(line) + lines_read += 1 + if max_lines and lines_read >= max_lines: + yield ''.join(doc) + return + yield ''.join(doc) + + else: + yield f.read() + + +def vocab_token_counts(text_filepattern, max_lines): + """Read a vocab file and return a dictionary of token counts. + + Reads a two-column CSV file of tokens and their frequency in a dataset. The + tokens are presumed to be generated by encode() or the equivalent. + + Args: + text_filepattern: A pattern matching one or more files. + max_lines: An integer; maximum total lines to read. + + Returns: + a dictionary mapping token to count. + """ + ret = {} + for i, line in enumerate( + _read_filepattern(text_filepattern, max_lines = max_lines) + ): + if ',' not in line: + tf.logging.warning("Malformed vocab line #%d '%s'", i, line) + continue + + token, count = line.rsplit(',', 1) + ret[_native_to_unicode(token)] = int(count) + + return ret diff --git a/neural-machine-translation/transformer/attention_layer.py b/neural-machine-translation/transformer/attention_layer.py new file mode 100644 index 0000000..9e9ca28 --- /dev/null +++ b/neural-machine-translation/transformer/attention_layer.py @@ -0,0 +1,159 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Implementation of multiheaded attention and self-attention layers.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + + +class Attention(tf.layers.Layer): + """Multi-headed attention layer.""" + + def __init__(self, hidden_size, num_heads, attention_dropout, train): + if hidden_size % num_heads != 0: + raise ValueError( + 'Hidden size must be evenly divisible by the number of ' + 'heads.' + ) + + super(Attention, self).__init__() + self.hidden_size = hidden_size + self.num_heads = num_heads + self.attention_dropout = attention_dropout + self.train = train + + # Layers for linearly projecting the queries, keys, and values. + self.q_dense_layer = tf.layers.Dense( + hidden_size, use_bias = False, name = 'q' + ) + self.k_dense_layer = tf.layers.Dense( + hidden_size, use_bias = False, name = 'k' + ) + self.v_dense_layer = tf.layers.Dense( + hidden_size, use_bias = False, name = 'v' + ) + + self.output_dense_layer = tf.layers.Dense( + hidden_size, use_bias = False, name = 'output_transform' + ) + + def split_heads(self, x): + """Split x into different heads, and transpose the resulting value. + + The tensor is transposed to insure the inner dimensions hold the correct + values during the matrix multiplication. + + Args: + x: A tensor with shape [batch_size, length, hidden_size] + + Returns: + A tensor with shape [batch_size, num_heads, length, hidden_size/num_heads] + """ + with tf.name_scope('split_heads'): + batch_size = tf.shape(x)[0] + length = tf.shape(x)[1] + + # Calculate depth of last dimension after it has been split. + depth = self.hidden_size // self.num_heads + + # Split the last dimension + x = tf.reshape(x, [batch_size, length, self.num_heads, depth]) + + # Transpose the result + return tf.transpose(x, [0, 2, 1, 3]) + + def combine_heads(self, x): + """Combine tensor that has been split. + + Args: + x: A tensor [batch_size, num_heads, length, hidden_size/num_heads] + + Returns: + A tensor with shape [batch_size, length, hidden_size] + """ + with tf.name_scope('combine_heads'): + batch_size = tf.shape(x)[0] + length = tf.shape(x)[2] + x = tf.transpose( + x, [0, 2, 1, 3] + ) # --> [batch, length, num_heads, depth] + return tf.reshape(x, [batch_size, length, self.hidden_size]) + + def call(self, x, y, bias, cache = None): + """Apply attention mechanism to x and y. + + Args: + x: a tensor with shape [batch_size, length_x, hidden_size] + y: a tensor with shape [batch_size, length_y, hidden_size] + bias: attention bias that will be added to the result of the dot product. + cache: (Used during prediction) dictionary with tensors containing results + of previous attentions. The dictionary must have the items: + {"k": tensor with shape [batch_size, i, key_channels], + "v": tensor with shape [batch_size, i, value_channels]} + where i is the current decoded length. + + Returns: + Attention layer output with shape [batch_size, length_x, hidden_size] + """ + # Linearly project the query (q), key (k) and value (v) using different + # learned projections. This is in preparation of splitting them into + # multiple heads. Multi-head attention uses multiple queries, keys, and + # values rather than regular attention (which uses a single q, k, v). + q = self.q_dense_layer(x) + k = self.k_dense_layer(y) + v = self.v_dense_layer(y) + + if cache is not None: + # Combine cached keys and values with new keys and values. + k = tf.concat([cache['k'], k], axis = 1) + v = tf.concat([cache['v'], v], axis = 1) + + # Update cache + cache['k'] = k + cache['v'] = v + + # Split q, k, v into heads. + q = self.split_heads(q) + k = self.split_heads(k) + v = self.split_heads(v) + + # Scale q to prevent the dot product between q and k from growing too large. + depth = self.hidden_size // self.num_heads + q *= depth ** -0.5 + + # Calculate dot product attention + logits = tf.matmul(q, k, transpose_b = True) + logits += bias + weights = tf.nn.softmax(logits, name = 'attention_weights') + if self.train: + weights = tf.nn.dropout(weights, 1.0 - self.attention_dropout) + attention_output = tf.matmul(weights, v) + + # Recombine heads --> [batch_size, length, hidden_size] + attention_output = self.combine_heads(attention_output) + + # Run the combined outputs through another linear projection layer. + attention_output = self.output_dense_layer(attention_output) + return attention_output + + +class SelfAttention(Attention): + """Multiheaded self-attention layer.""" + + def call(self, x, bias, cache = None): + return super(SelfAttention, self).call(x, x, bias, cache) diff --git a/neural-machine-translation/transformer/beam_search.py b/neural-machine-translation/transformer/beam_search.py new file mode 100644 index 0000000..389d257 --- /dev/null +++ b/neural-machine-translation/transformer/beam_search.py @@ -0,0 +1,611 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Beam search to find the translated sequence with the highest probability. + +Source implementation from Tensor2Tensor: +https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/beam_search.py +""" + +import tensorflow as tf +from tensorflow.python.util import nest + +# Default value for INF +INF = 1.0 * 1e7 + + +class _StateKeys(object): + """Keys to dictionary storing the state of the beam search loop.""" + + # Variable storing the loop index. + CUR_INDEX = 'CUR_INDEX' + + # Top sequences that are alive for each batch item. Alive sequences are ones + # that have not generated an EOS token. Sequences that reach EOS are marked as + # finished and moved to the FINISHED_SEQ tensor. + # Has shape [batch_size, beam_size, CUR_INDEX + 1] + ALIVE_SEQ = 'ALIVE_SEQ' + # Log probabilities of each alive sequence. Shape [batch_size, beam_size] + ALIVE_LOG_PROBS = 'ALIVE_LOG_PROBS' + # Dictionary of cached values for each alive sequence. The cache stores + # the encoder output, attention bias, and the decoder attention output from + # the previous iteration. + ALIVE_CACHE = 'ALIVE_CACHE' + + # Top finished sequences for each batch item. + # Has shape [batch_size, beam_size, CUR_INDEX + 1]. Sequences that are + # shorter than CUR_INDEX + 1 are padded with 0s. + FINISHED_SEQ = 'FINISHED_SEQ' + # Scores for each finished sequence. Score = log probability / length norm + # Shape [batch_size, beam_size] + FINISHED_SCORES = 'FINISHED_SCORES' + # Flags indicating which sequences in the finished sequences are finished. + # At the beginning, all of the sequences in FINISHED_SEQ are filler values. + # True -> finished sequence, False -> filler. Shape [batch_size, beam_size] + FINISHED_FLAGS = 'FINISHED_FLAGS' + + +class SequenceBeamSearch(object): + """Implementation of beam search loop.""" + + def __init__( + self, + symbols_to_logits_fn, + vocab_size, + batch_size, + beam_size, + alpha, + max_decode_length, + eos_id, + ): + self.symbols_to_logits_fn = symbols_to_logits_fn + self.vocab_size = vocab_size + self.batch_size = batch_size + self.beam_size = beam_size + self.alpha = alpha + self.max_decode_length = max_decode_length + self.eos_id = eos_id + + def search(self, initial_ids, initial_cache): + """Beam search for sequences with highest scores.""" + state, state_shapes = self._create_initial_state( + initial_ids, initial_cache + ) + + finished_state = tf.while_loop( + self._continue_search, + self._search_step, + loop_vars = [state], + shape_invariants = [state_shapes], + parallel_iterations = 1, + back_prop = False, + ) + finished_state = finished_state[0] + + alive_seq = finished_state[_StateKeys.ALIVE_SEQ] + alive_log_probs = finished_state[_StateKeys.ALIVE_LOG_PROBS] + finished_seq = finished_state[_StateKeys.FINISHED_SEQ] + finished_scores = finished_state[_StateKeys.FINISHED_SCORES] + finished_flags = finished_state[_StateKeys.FINISHED_FLAGS] + + # Account for corner case where there are no finished sequences for a + # particular batch item. In that case, return alive sequences for that batch + # item. + finished_seq = tf.where( + tf.reduce_any(finished_flags, 1), finished_seq, alive_seq + ) + finished_scores = tf.where( + tf.reduce_any(finished_flags, 1), finished_scores, alive_log_probs + ) + return finished_seq, finished_scores + + def _create_initial_state(self, initial_ids, initial_cache): + """Return initial state dictionary and its shape invariants. + + Args: + initial_ids: initial ids to pass into the symbols_to_logits_fn. + int tensor with shape [batch_size, 1] + initial_cache: dictionary storing values to be passed into the + symbols_to_logits_fn. + + Returns: + state and shape invariant dictionaries with keys from _StateKeys + """ + # Current loop index (starts at 0) + cur_index = tf.constant(0) + + # Create alive sequence with shape [batch_size, beam_size, 1] + alive_seq = _expand_to_beam_size(initial_ids, self.beam_size) + alive_seq = tf.expand_dims(alive_seq, axis = 2) + + # Create tensor for storing initial log probabilities. + # Assume initial_ids are prob 1.0 + initial_log_probs = tf.constant( + [[0.0] + [-float('inf')] * (self.beam_size - 1)] + ) + alive_log_probs = tf.tile(initial_log_probs, [self.batch_size, 1]) + + # Expand all values stored in the dictionary to the beam size, so that each + # beam has a separate cache. + alive_cache = nest.map_structure( + lambda t: _expand_to_beam_size(t, self.beam_size), initial_cache + ) + + # Initialize tensor storing finished sequences with filler values. + finished_seq = tf.zeros(tf.shape(alive_seq), tf.int32) + + # Set scores of the initial finished seqs to negative infinity. + finished_scores = tf.ones([self.batch_size, self.beam_size]) * -INF + + # Initialize finished flags with all False values. + finished_flags = tf.zeros([self.batch_size, self.beam_size], tf.bool) + + # Create state dictionary + state = { + _StateKeys.CUR_INDEX: cur_index, + _StateKeys.ALIVE_SEQ: alive_seq, + _StateKeys.ALIVE_LOG_PROBS: alive_log_probs, + _StateKeys.ALIVE_CACHE: alive_cache, + _StateKeys.FINISHED_SEQ: finished_seq, + _StateKeys.FINISHED_SCORES: finished_scores, + _StateKeys.FINISHED_FLAGS: finished_flags, + } + + # Create state invariants for each value in the state dictionary. Each + # dimension must be a constant or None. A None dimension means either: + # 1) the dimension's value is a tensor that remains the same but may + # depend on the input sequence to the model (e.g. batch size). + # 2) the dimension may have different values on different iterations. + state_shape_invariants = { + _StateKeys.CUR_INDEX: tf.TensorShape([]), + _StateKeys.ALIVE_SEQ: tf.TensorShape([None, self.beam_size, None]), + _StateKeys.ALIVE_LOG_PROBS: tf.TensorShape([None, self.beam_size]), + _StateKeys.ALIVE_CACHE: nest.map_structure( + _get_shape_keep_last_dim, alive_cache + ), + _StateKeys.FINISHED_SEQ: tf.TensorShape( + [None, self.beam_size, None] + ), + _StateKeys.FINISHED_SCORES: tf.TensorShape([None, self.beam_size]), + _StateKeys.FINISHED_FLAGS: tf.TensorShape([None, self.beam_size]), + } + + return state, state_shape_invariants + + def _continue_search(self, state): + """Return whether to continue the search loop. + + The loops should terminate when + 1) when decode length has been reached, or + 2) when the worst score in the finished sequences is better than the best + score in the alive sequences (i.e. the finished sequences are provably + unchanging) + + Args: + state: A dictionary with the current loop state. + + Returns: + Bool tensor with value True if loop should continue, False if loop should + terminate. + """ + i = state[_StateKeys.CUR_INDEX] + alive_log_probs = state[_StateKeys.ALIVE_LOG_PROBS] + finished_scores = state[_StateKeys.FINISHED_SCORES] + finished_flags = state[_StateKeys.FINISHED_FLAGS] + + not_at_max_decode_length = tf.less(i, self.max_decode_length) + + # Calculate largest length penalty (the larger penalty, the better score). + max_length_norm = _length_normalization( + self.alpha, self.max_decode_length + ) + # Get the best possible scores from alive sequences. + best_alive_scores = alive_log_probs[:, 0] / max_length_norm + + # Compute worst score in finished sequences for each batch element + finished_scores *= tf.to_float( + finished_flags + ) # set filler scores to zero + lowest_finished_scores = tf.reduce_min(finished_scores, axis = 1) + + # If there are no finished sequences in a batch element, then set the lowest + # finished score to -INF for that element. + finished_batches = tf.reduce_any(finished_flags, 1) + lowest_finished_scores += (1.0 - tf.to_float(finished_batches)) * -INF + + worst_finished_score_better_than_best_alive_score = tf.reduce_all( + tf.greater(lowest_finished_scores, best_alive_scores) + ) + + return tf.logical_and( + not_at_max_decode_length, + tf.logical_not(worst_finished_score_better_than_best_alive_score), + ) + + def _search_step(self, state): + """Beam search loop body. + + Grow alive sequences by a single ID. Sequences that have reached the EOS + token are marked as finished. The alive and finished sequences with the + highest log probabilities and scores are returned. + + A sequence's finished score is calculating by dividing the log probability + by the length normalization factor. Without length normalization, the + search is more likely to return shorter sequences. + + Args: + state: A dictionary with the current loop state. + + Returns: + new state dictionary. + """ + # Grow alive sequences by one token. + new_seq, new_log_probs, new_cache = self._grow_alive_seq(state) + # Collect top beam_size alive sequences + alive_state = self._get_new_alive_state( + new_seq, new_log_probs, new_cache + ) + + # Combine newly finished sequences with existing finished sequences, and + # collect the top k scoring sequences. + finished_state = self._get_new_finished_state( + state, new_seq, new_log_probs + ) + + # Increment loop index and create new state dictionary + new_state = {_StateKeys.CUR_INDEX: state[_StateKeys.CUR_INDEX] + 1} + new_state.update(alive_state) + new_state.update(finished_state) + return [new_state] + + def _grow_alive_seq(self, state): + """Grow alive sequences by one token, and collect top 2*beam_size sequences. + + 2*beam_size sequences are collected because some sequences may have reached + the EOS token. 2*beam_size ensures that at least beam_size sequences are + still alive. + + Args: + state: A dictionary with the current loop state. + Returns: + Tuple of + (Top 2*beam_size sequences [batch_size, 2 * beam_size, cur_index + 1], + Scores of returned sequences [batch_size, 2 * beam_size], + New alive cache, for each of the 2 * beam_size sequences) + """ + i = state[_StateKeys.CUR_INDEX] + alive_seq = state[_StateKeys.ALIVE_SEQ] + alive_log_probs = state[_StateKeys.ALIVE_LOG_PROBS] + alive_cache = state[_StateKeys.ALIVE_CACHE] + + beams_to_keep = 2 * self.beam_size + + # Get logits for the next candidate IDs for the alive sequences. Get the new + # cache values at the same time. + flat_ids = _flatten_beam_dim(alive_seq) # [batch_size * beam_size] + flat_cache = nest.map_structure(_flatten_beam_dim, alive_cache) + + flat_logits, flat_cache = self.symbols_to_logits_fn( + flat_ids, i, flat_cache + ) + + # Unflatten logits to shape [batch_size, beam_size, vocab_size] + logits = _unflatten_beam_dim( + flat_logits, self.batch_size, self.beam_size + ) + new_cache = nest.map_structure( + lambda t: _unflatten_beam_dim(t, self.batch_size, self.beam_size), + flat_cache, + ) + + # Convert logits to normalized log probs + candidate_log_probs = _log_prob_from_logits(logits) + + # Calculate new log probabilities if each of the alive sequences were + # extended # by the the candidate IDs. + # Shape [batch_size, beam_size, vocab_size] + log_probs = candidate_log_probs + tf.expand_dims( + alive_log_probs, axis = 2 + ) + + # Each batch item has beam_size * vocab_size candidate sequences. For each + # batch item, get the k candidates with the highest log probabilities. + flat_log_probs = tf.reshape( + log_probs, [-1, self.beam_size * self.vocab_size] + ) + topk_log_probs, topk_indices = tf.nn.top_k( + flat_log_probs, k = beams_to_keep + ) + + # Extract the alive sequences that generate the highest log probabilities + # after being extended. + topk_beam_indices = topk_indices // self.vocab_size + topk_seq, new_cache = _gather_beams( + [alive_seq, new_cache], + topk_beam_indices, + self.batch_size, + beams_to_keep, + ) + + # Append the most probable IDs to the topk sequences + topk_ids = topk_indices % self.vocab_size + topk_ids = tf.expand_dims(topk_ids, axis = 2) + topk_seq = tf.concat([topk_seq, topk_ids], axis = 2) + return topk_seq, topk_log_probs, new_cache + + def _get_new_alive_state(self, new_seq, new_log_probs, new_cache): + """Gather the top k sequences that are still alive. + + Args: + new_seq: New sequences generated by growing the current alive sequences + int32 tensor with shape [batch_size, 2 * beam_size, cur_index + 1] + new_log_probs: Log probabilities of new sequences + float32 tensor with shape [batch_size, beam_size] + new_cache: Dict of cached values for each sequence. + + Returns: + Dictionary with alive keys from _StateKeys: + {Top beam_size sequences that are still alive (don't end with eos_id) + Log probabilities of top alive sequences + Dict cache storing decoder states for top alive sequences} + """ + # To prevent finished sequences from being considered, set log probs to -INF + new_finished_flags = tf.equal(new_seq[:, :, -1], self.eos_id) + new_log_probs += tf.to_float(new_finished_flags) * -INF + + top_alive_seq, top_alive_log_probs, top_alive_cache = _gather_topk_beams( + [new_seq, new_log_probs, new_cache], + new_log_probs, + self.batch_size, + self.beam_size, + ) + + return { + _StateKeys.ALIVE_SEQ: top_alive_seq, + _StateKeys.ALIVE_LOG_PROBS: top_alive_log_probs, + _StateKeys.ALIVE_CACHE: top_alive_cache, + } + + def _get_new_finished_state(self, state, new_seq, new_log_probs): + """Combine new and old finished sequences, and gather the top k sequences. + + Args: + state: A dictionary with the current loop state. + new_seq: New sequences generated by growing the current alive sequences + int32 tensor with shape [batch_size, beam_size, i + 1] + new_log_probs: Log probabilities of new sequences + float32 tensor with shape [batch_size, beam_size] + + Returns: + Dictionary with finished keys from _StateKeys: + {Top beam_size finished sequences based on score, + Scores of finished sequences, + Finished flags of finished sequences} + """ + i = state[_StateKeys.CUR_INDEX] + finished_seq = state[_StateKeys.FINISHED_SEQ] + finished_scores = state[_StateKeys.FINISHED_SCORES] + finished_flags = state[_StateKeys.FINISHED_FLAGS] + + # First append a column of 0-ids to finished_seq to increment the length. + # New shape of finished_seq: [batch_size, beam_size, i + 1] + finished_seq = tf.concat( + [ + finished_seq, + tf.zeros([self.batch_size, self.beam_size, 1], tf.int32), + ], + axis = 2, + ) + + # Calculate new seq scores from log probabilities. + length_norm = _length_normalization(self.alpha, i + 1) + new_scores = new_log_probs / length_norm + + # Set the scores of the still-alive seq in new_seq to large negative values. + new_finished_flags = tf.equal(new_seq[:, :, -1], self.eos_id) + new_scores += (1.0 - tf.to_float(new_finished_flags)) * -INF + + # Combine sequences, scores, and flags. + finished_seq = tf.concat([finished_seq, new_seq], axis = 1) + finished_scores = tf.concat([finished_scores, new_scores], axis = 1) + finished_flags = tf.concat( + [finished_flags, new_finished_flags], axis = 1 + ) + + # Return the finished sequences with the best scores. + top_finished_seq, top_finished_scores, top_finished_flags = _gather_topk_beams( + [finished_seq, finished_scores, finished_flags], + finished_scores, + self.batch_size, + self.beam_size, + ) + + return { + _StateKeys.FINISHED_SEQ: top_finished_seq, + _StateKeys.FINISHED_SCORES: top_finished_scores, + _StateKeys.FINISHED_FLAGS: top_finished_flags, + } + + +def sequence_beam_search( + symbols_to_logits_fn, + initial_ids, + initial_cache, + vocab_size, + beam_size, + alpha, + max_decode_length, + eos_id, +): + """Search for sequence of subtoken ids with the largest probability. + + Args: + symbols_to_logits_fn: A function that takes in ids, index, and cache as + arguments. The passed in arguments will have shape: + ids -> [batch_size * beam_size, index] + index -> [] (scalar) + cache -> nested dictionary of tensors [batch_size * beam_size, ...] + The function must return logits and new cache. + logits -> [batch * beam_size, vocab_size] + new cache -> same shape/structure as inputted cache + initial_ids: Starting ids for each batch item. + int32 tensor with shape [batch_size] + initial_cache: dict containing starting decoder variables information + vocab_size: int size of tokens + beam_size: int number of beams + alpha: float defining the strength of length normalization + max_decode_length: maximum length to decoded sequence + eos_id: int id of eos token, used to determine when a sequence has finished + + Returns: + Top decoded sequences [batch_size, beam_size, max_decode_length] + sequence scores [batch_size, beam_size] + """ + batch_size = tf.shape(initial_ids)[0] + sbs = SequenceBeamSearch( + symbols_to_logits_fn, + vocab_size, + batch_size, + beam_size, + alpha, + max_decode_length, + eos_id, + ) + return sbs.search(initial_ids, initial_cache) + + +def _log_prob_from_logits(logits): + return logits - tf.reduce_logsumexp(logits, axis = 2, keep_dims = True) + + +def _length_normalization(alpha, length): + """Return length normalization factor.""" + return tf.pow(((5.0 + tf.to_float(length)) / 6.0), alpha) + + +def _expand_to_beam_size(tensor, beam_size): + """Tiles a given tensor by beam_size. + + Args: + tensor: tensor to tile [batch_size, ...] + beam_size: How much to tile the tensor by. + + Returns: + Tiled tensor [batch_size, beam_size, ...] + """ + tensor = tf.expand_dims(tensor, axis = 1) + tile_dims = [1] * tensor.shape.ndims + tile_dims[1] = beam_size + + return tf.tile(tensor, tile_dims) + + +def _shape_list(tensor): + """Return a list of the tensor's shape, and ensure no None values in list.""" + # Get statically known shape (may contain None's for unknown dimensions) + shape = tensor.get_shape().as_list() + + # Ensure that the shape values are not None + dynamic_shape = tf.shape(tensor) + for i in range(len(shape)): # pylint: disable=consider-using-enumerate + if shape[i] is None: + shape[i] = dynamic_shape[i] + return shape + + +def _get_shape_keep_last_dim(tensor): + shape_list = _shape_list(tensor) + + # Only the last + for i in range(len(shape_list) - 1): + shape_list[i] = None + + if isinstance(shape_list[-1], tf.Tensor): + shape_list[-1] = None + return tf.TensorShape(shape_list) + + +def _flatten_beam_dim(tensor): + """Reshapes first two dimensions in to single dimension. + + Args: + tensor: Tensor to reshape of shape [A, B, ...] + + Returns: + Reshaped tensor of shape [A*B, ...] + """ + shape = _shape_list(tensor) + shape[0] *= shape[1] + shape.pop(1) # Remove beam dim + return tf.reshape(tensor, shape) + + +def _unflatten_beam_dim(tensor, batch_size, beam_size): + """Reshapes first dimension back to [batch_size, beam_size]. + + Args: + tensor: Tensor to reshape of shape [batch_size*beam_size, ...] + batch_size: Tensor, original batch size. + beam_size: int, original beam size. + + Returns: + Reshaped tensor of shape [batch_size, beam_size, ...] + """ + shape = _shape_list(tensor) + new_shape = [batch_size, beam_size] + shape[1:] + return tf.reshape(tensor, new_shape) + + +def _gather_beams(nested, beam_indices, batch_size, new_beam_size): + """Gather beams from nested structure of tensors. + + Each tensor in nested represents a batch of beams, where beam refers to a + single search state (beam search involves searching through multiple states + in parallel). + + This function is used to gather the top beams, specified by + beam_indices, from the nested tensors. + + Args: + nested: Nested structure (tensor, list, tuple or dict) containing tensors + with shape [batch_size, beam_size, ...]. + beam_indices: int32 tensor with shape [batch_size, new_beam_size]. Each + value in beam_indices must be between [0, beam_size), and are not + necessarily unique. + batch_size: int size of batch + new_beam_size: int number of beams to be pulled from the nested tensors. + + Returns: + Nested structure containing tensors with shape + [batch_size, new_beam_size, ...] + """ + # Computes the i'th coodinate that contains the batch index for gather_nd. + # Batch pos is a tensor like [[0,0,0,0,],[1,1,1,1],..]. + batch_pos = tf.range(batch_size * new_beam_size) // new_beam_size + batch_pos = tf.reshape(batch_pos, [batch_size, new_beam_size]) + + # Create coordinates to be passed to tf.gather_nd. Stacking creates a tensor + # with shape [batch_size, beam_size, 2], where the last dimension contains + # the (i, j) gathering coordinates. + coordinates = tf.stack([batch_pos, beam_indices], axis = 2) + + return nest.map_structure( + lambda state: tf.gather_nd(state, coordinates), nested + ) + + +def _gather_topk_beams(nested, score_or_log_prob, batch_size, beam_size): + """Gather top beams from nested structure.""" + _, topk_indexes = tf.nn.top_k(score_or_log_prob, k = beam_size) + return _gather_beams(nested, topk_indexes, batch_size, beam_size) diff --git a/neural-machine-translation/transformer/embedding_layer.py b/neural-machine-translation/transformer/embedding_layer.py new file mode 100644 index 0000000..43a3d52 --- /dev/null +++ b/neural-machine-translation/transformer/embedding_layer.py @@ -0,0 +1,109 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Implementation of embedding layer with shared weights.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf # pylint: disable=g-bad-import-order + + +class EmbeddingSharedWeights(tf.layers.Layer): + """Calculates input embeddings and pre-softmax linear with shared weights.""" + + def __init__(self, vocab_size, hidden_size, method = 'gather'): + """Specify characteristic parameters of embedding layer. + + Args: + vocab_size: Number of tokens in the embedding. (Typically ~32,000) + hidden_size: Dimensionality of the embedding. (Typically 512 or 1024) + method: Strategy for performing embedding lookup. "gather" uses tf.gather + which performs well on CPUs and GPUs, but very poorly on TPUs. "matmul" + one-hot encodes the indicies and formulates the embedding as a sparse + matrix multiplication. The matmul formulation is wasteful as it does + extra work, however matrix multiplication is very fast on TPUs which + makes "matmul" considerably faster than "gather" on TPUs. + """ + super(EmbeddingSharedWeights, self).__init__() + self.vocab_size = vocab_size + self.hidden_size = hidden_size + if method not in ('gather', 'matmul'): + raise ValueError( + "method {} must be 'gather' or 'matmul'".format(method) + ) + self.method = method + + def build(self, _): + with tf.variable_scope('embedding_and_softmax', reuse = tf.AUTO_REUSE): + # Create and initialize weights. The random normal initializer was chosen + # randomly, and works well. + self.shared_weights = tf.get_variable( + 'weights', + [self.vocab_size, self.hidden_size], + initializer = tf.random_normal_initializer( + 0.0, self.hidden_size ** -0.5 + ), + ) + + self.built = True + + def call(self, x): + """Get token embeddings of x. + + Args: + x: An int64 tensor with shape [batch_size, length] + Returns: + embeddings: float32 tensor with shape [batch_size, length, embedding_size] + padding: float32 tensor with shape [batch_size, length] indicating the + locations of the padding tokens in x. + """ + with tf.name_scope('embedding'): + # Create binary mask of size [batch_size, length] + mask = tf.to_float(tf.not_equal(x, 0)) + + if self.method == 'gather': + embeddings = tf.gather(self.shared_weights, x) + embeddings *= tf.expand_dims(mask, -1) + else: # matmul + embeddings = tpu_utils.embedding_matmul( + embedding_table = self.shared_weights, + values = tf.cast(x, dtype = tf.int32), + mask = mask, + ) + # embedding_matmul already zeros out masked positions, so + # `embeddings *= tf.expand_dims(mask, -1)` is unnecessary. + + # Scale embedding by the sqrt of the hidden size + embeddings *= self.hidden_size ** 0.5 + + return embeddings + + def linear(self, x): + """Computes logits by running x through a linear layer. + + Args: + x: A float32 tensor with shape [batch_size, length, hidden_size] + Returns: + float32 tensor with shape [batch_size, length, vocab_size]. + """ + with tf.name_scope('presoftmax_linear'): + batch_size = tf.shape(x)[0] + length = tf.shape(x)[1] + + x = tf.reshape(x, [-1, self.hidden_size]) + logits = tf.matmul(x, self.shared_weights, transpose_b = True) + + return tf.reshape(logits, [batch_size, length, self.vocab_size]) diff --git a/neural-machine-translation/transformer/ffn_layer.py b/neural-machine-translation/transformer/ffn_layer.py new file mode 100644 index 0000000..2d837cf --- /dev/null +++ b/neural-machine-translation/transformer/ffn_layer.py @@ -0,0 +1,98 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Implementation of fully connected network.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf + + +class FeedFowardNetwork(tf.layers.Layer): + """Fully connected feedforward network.""" + + def __init__( + self, hidden_size, filter_size, relu_dropout, train, allow_pad + ): + super(FeedFowardNetwork, self).__init__() + self.hidden_size = hidden_size + self.filter_size = filter_size + self.relu_dropout = relu_dropout + self.train = train + self.allow_pad = allow_pad + + self.filter_dense_layer = tf.layers.Dense( + filter_size, + use_bias = True, + activation = tf.nn.relu, + name = 'filter_layer', + ) + self.output_dense_layer = tf.layers.Dense( + hidden_size, use_bias = True, name = 'output_layer' + ) + + def call(self, x, padding = None): + """Return outputs of the feedforward network. + + Args: + x: tensor with shape [batch_size, length, hidden_size] + padding: (optional) If set, the padding values are temporarily removed + from x (provided self.allow_pad is set). The padding values are placed + back in the output tensor in the same locations. + shape [batch_size, length] + + Returns: + Output of the feedforward network. + tensor with shape [batch_size, length, hidden_size] + """ + padding = None if not self.allow_pad else padding + + # Retrieve dynamically known shapes + batch_size = tf.shape(x)[0] + length = tf.shape(x)[1] + + if padding is not None: + with tf.name_scope('remove_padding'): + # Flatten padding to [batch_size*length] + pad_mask = tf.reshape(padding, [-1]) + + nonpad_ids = tf.to_int32(tf.where(pad_mask < 1e-9)) + + # Reshape x to [batch_size*length, hidden_size] to remove padding + x = tf.reshape(x, [-1, self.hidden_size]) + x = tf.gather_nd(x, indices = nonpad_ids) + + # Reshape x from 2 dimensions to 3 dimensions. + x.set_shape([None, self.hidden_size]) + x = tf.expand_dims(x, axis = 0) + + output = self.filter_dense_layer(x) + if self.train: + output = tf.nn.dropout(output, 1.0 - self.relu_dropout) + output = self.output_dense_layer(output) + + if padding is not None: + with tf.name_scope('re_add_padding'): + output = tf.squeeze(output, axis = 0) + output = tf.scatter_nd( + indices = nonpad_ids, + updates = output, + shape = [batch_size * length, self.hidden_size], + ) + output = tf.reshape( + output, [batch_size, length, self.hidden_size] + ) + return output diff --git a/neural-machine-translation/transformer/model_utils.py b/neural-machine-translation/transformer/model_utils.py new file mode 100644 index 0000000..74fd689 --- /dev/null +++ b/neural-machine-translation/transformer/model_utils.py @@ -0,0 +1,115 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Transformer model helper methods.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import math + +import tensorflow as tf + +_NEG_INF = -1e9 + + +def get_position_encoding( + length, hidden_size, min_timescale = 1.0, max_timescale = 1.0e4 +): + """Return positional encoding. + + Calculates the position encoding as a mix of sine and cosine functions with + geometrically increasing wavelengths. + Defined and formulized in Attention is All You Need, section 3.5. + + Args: + length: Sequence length. + hidden_size: Size of the + min_timescale: Minimum scale that will be applied at each position + max_timescale: Maximum scale that will be applied at each position + + Returns: + Tensor with shape [length, hidden_size] + """ + position = tf.to_float(tf.range(length)) + num_timescales = hidden_size // 2 + log_timescale_increment = math.log( + float(max_timescale) / float(min_timescale) + ) / (tf.to_float(num_timescales) - 1) + inv_timescales = min_timescale * tf.exp( + tf.to_float(tf.range(num_timescales)) * -log_timescale_increment + ) + scaled_time = tf.expand_dims(position, 1) * tf.expand_dims( + inv_timescales, 0 + ) + signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis = 1) + return signal + + +def get_decoder_self_attention_bias(length): + """Calculate bias for decoder that maintains model's autoregressive property. + + Creates a tensor that masks out locations that correspond to illegal + connections, so prediction at position i cannot draw information from future + positions. + + Args: + length: int length of sequences in batch. + + Returns: + float tensor of shape [1, 1, length, length] + """ + with tf.name_scope('decoder_self_attention_bias'): + valid_locs = tf.matrix_band_part(tf.ones([length, length]), -1, 0) + valid_locs = tf.reshape(valid_locs, [1, 1, length, length]) + decoder_bias = _NEG_INF * (1.0 - valid_locs) + return decoder_bias + + +def get_padding(x, padding_value = 0): + """Return float tensor representing the padding values in x. + + Args: + x: int tensor with any shape + padding_value: int value that + + Returns: + flaot tensor with same shape as x containing values 0 or 1. + 0 -> non-padding, 1 -> padding + """ + with tf.name_scope('padding'): + return tf.to_float(tf.equal(x, padding_value)) + + +def get_padding_bias(x): + """Calculate bias tensor from padding values in tensor. + + Bias tensor that is added to the pre-softmax multi-headed attention logits, + which has shape [batch_size, num_heads, length, length]. The tensor is zero at + non-padding locations, and -1e9 (negative infinity) at padding locations. + + Args: + x: int tensor with shape [batch_size, length] + + Returns: + Attention bias tensor of shape [batch_size, 1, 1, length]. + """ + with tf.name_scope('attention_bias'): + padding = get_padding(x) + attention_bias = padding * _NEG_INF + attention_bias = tf.expand_dims( + tf.expand_dims(attention_bias, axis = 1), axis = 1 + ) + return attention_bias diff --git a/neural-machine-translation/transformer/transformer.py b/neural-machine-translation/transformer/transformer.py new file mode 100644 index 0000000..69be85d --- /dev/null +++ b/neural-machine-translation/transformer/transformer.py @@ -0,0 +1,501 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Defines the Transformer model, and its encoder and decoder stacks. + +Model paper: https://arxiv.org/pdf/1706.03762.pdf +Transformer model code source: https://github.com/tensorflow/tensor2tensor +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import tensorflow as tf # pylint: disable=g-bad-import-order + +from . import attention_layer +from . import beam_search +from . import embedding_layer +from . import ffn_layer +from . import model_utils + +EOS_ID = 1 +_NEG_INF = -1e9 + + +class Transformer(object): + """Transformer model for sequence to sequence data. + + Implemented as described in: https://arxiv.org/pdf/1706.03762.pdf + + The Transformer model consists of an encoder and decoder. The input is an int + sequence (or a batch of sequences). The encoder produces a continous + representation, and the decoder uses the encoder output to generate + probabilities for the output sequence. + """ + + def __init__(self, params, train): + """Initialize layers to build Transformer model. + + Args: + params: hyperparameter object defining layer sizes, dropout values, etc. + train: boolean indicating whether the model is in training mode. Used to + determine if dropout layers should be added. + """ + self.train = train + self.params = params + + self.embedding_softmax_layer = embedding_layer.EmbeddingSharedWeights( + params['vocab_size'], + params['hidden_size'], + method = 'matmul' if params['tpu'] else 'gather', + ) + self.encoder_stack = EncoderStack(params, train) + self.decoder_stack = DecoderStack(params, train) + + def __call__(self, inputs, targets = None): + """Calculate target logits or inferred target sequences. + + Args: + inputs: int tensor with shape [batch_size, input_length]. + targets: None or int tensor with shape [batch_size, target_length]. + + Returns: + If targets is defined, then return logits for each word in the target + sequence. float tensor with shape [batch_size, target_length, vocab_size] + If target is none, then generate output sequence one token at a time. + returns a dictionary { + output: [batch_size, decoded length] + score: [batch_size, float]} + """ + # Variance scaling is used here because it seems to work in many problems. + # Other reasonable initializers may also work just as well. + initializer = tf.variance_scaling_initializer( + self.params['initializer_gain'], + mode = 'fan_avg', + distribution = 'uniform', + ) + with tf.variable_scope('Transformer', initializer = initializer): + # Calculate attention bias for encoder self-attention and decoder + # multi-headed attention layers. + attention_bias = model_utils.get_padding_bias(inputs) + + # Run the inputs through the encoder layer to map the symbol + # representations to continuous representations. + encoder_outputs = self.encode(inputs, attention_bias) + + # Generate output sequence if targets is None, or return logits if target + # sequence is known. + if targets is None: + return self.predict(encoder_outputs, attention_bias) + else: + logits = self.decode(targets, encoder_outputs, attention_bias) + return logits + + def encode(self, inputs, attention_bias): + """Generate continuous representation for inputs. + + Args: + inputs: int tensor with shape [batch_size, input_length]. + attention_bias: float tensor with shape [batch_size, 1, 1, input_length] + + Returns: + float tensor with shape [batch_size, input_length, hidden_size] + """ + with tf.name_scope('encode'): + # Prepare inputs to the layer stack by adding positional encodings and + # applying dropout. + embedded_inputs = self.embedding_softmax_layer(inputs) + inputs_padding = model_utils.get_padding(inputs) + + with tf.name_scope('add_pos_encoding'): + length = tf.shape(embedded_inputs)[1] + pos_encoding = model_utils.get_position_encoding( + length, self.params['hidden_size'] + ) + encoder_inputs = embedded_inputs + pos_encoding + + if self.train: + encoder_inputs = tf.nn.dropout( + encoder_inputs, 1 - self.params['layer_postprocess_dropout'] + ) + + return self.encoder_stack( + encoder_inputs, attention_bias, inputs_padding + ) + + def decode(self, targets, encoder_outputs, attention_bias): + """Generate logits for each value in the target sequence. + + Args: + targets: target values for the output sequence. + int tensor with shape [batch_size, target_length] + encoder_outputs: continuous representation of input sequence. + float tensor with shape [batch_size, input_length, hidden_size] + attention_bias: float tensor with shape [batch_size, 1, 1, input_length] + + Returns: + float32 tensor with shape [batch_size, target_length, vocab_size] + """ + with tf.name_scope('decode'): + # Prepare inputs to decoder layers by shifting targets, adding positional + # encoding and applying dropout. + decoder_inputs = self.embedding_softmax_layer(targets) + with tf.name_scope('shift_targets'): + # Shift targets to the right, and remove the last element + decoder_inputs = tf.pad( + decoder_inputs, [[0, 0], [1, 0], [0, 0]] + )[:, :-1, :] + with tf.name_scope('add_pos_encoding'): + length = tf.shape(decoder_inputs)[1] + decoder_inputs += model_utils.get_position_encoding( + length, self.params['hidden_size'] + ) + if self.train: + decoder_inputs = tf.nn.dropout( + decoder_inputs, 1 - self.params['layer_postprocess_dropout'] + ) + + # Run values + decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias( + length + ) + outputs = self.decoder_stack( + decoder_inputs, + encoder_outputs, + decoder_self_attention_bias, + attention_bias, + ) + logits = self.embedding_softmax_layer.linear(outputs) + return logits + + def _get_symbols_to_logits_fn(self, max_decode_length): + """Returns a decoding function that calculates logits of the next tokens.""" + + timing_signal = model_utils.get_position_encoding( + max_decode_length + 1, self.params['hidden_size'] + ) + decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias( + max_decode_length + ) + + def symbols_to_logits_fn(ids, i, cache): + """Generate logits for next potential IDs. + + Args: + ids: Current decoded sequences. + int tensor with shape [batch_size * beam_size, i + 1] + i: Loop index + cache: dictionary of values storing the encoder output, encoder-decoder + attention bias, and previous decoder attention values. + + Returns: + Tuple of + (logits with shape [batch_size * beam_size, vocab_size], + updated cache values) + """ + # Set decoder input to the last generated IDs + decoder_input = ids[:, -1:] + + # Preprocess decoder input by getting embeddings and adding timing signal. + decoder_input = self.embedding_softmax_layer(decoder_input) + decoder_input += timing_signal[i : i + 1] + + self_attention_bias = decoder_self_attention_bias[ + :, :, i : i + 1, : i + 1 + ] + decoder_outputs = self.decoder_stack( + decoder_input, + cache.get('encoder_outputs'), + self_attention_bias, + cache.get('encoder_decoder_attention_bias'), + cache, + ) + logits = self.embedding_softmax_layer.linear(decoder_outputs) + logits = tf.squeeze(logits, axis = [1]) + return logits, cache + + return symbols_to_logits_fn + + def predict(self, encoder_outputs, encoder_decoder_attention_bias): + """Return predicted sequence.""" + batch_size = tf.shape(encoder_outputs)[0] + input_length = tf.shape(encoder_outputs)[1] + max_decode_length = input_length + self.params['extra_decode_length'] + + symbols_to_logits_fn = self._get_symbols_to_logits_fn(max_decode_length) + + # Create initial set of IDs that will be passed into symbols_to_logits_fn. + initial_ids = tf.zeros([batch_size], dtype = tf.int32) + + # Create cache storing decoder attention values for each layer. + cache = { + 'layer_%d' + % layer: { + 'k': tf.zeros([batch_size, 0, self.params['hidden_size']]), + 'v': tf.zeros([batch_size, 0, self.params['hidden_size']]), + } + for layer in range(self.params['num_hidden_layers']) + } + + # Add encoder output and attention bias to the cache. + cache['encoder_outputs'] = encoder_outputs + cache['encoder_decoder_attention_bias'] = encoder_decoder_attention_bias + + # Use beam search to find the top beam_size sequences and scores. + decoded_ids, scores = beam_search.sequence_beam_search( + symbols_to_logits_fn = symbols_to_logits_fn, + initial_ids = initial_ids, + initial_cache = cache, + vocab_size = self.params['vocab_size'], + beam_size = self.params['beam_size'], + alpha = self.params['alpha'], + max_decode_length = max_decode_length, + eos_id = EOS_ID, + ) + + # Get the top sequence for each batch element + top_decoded_ids = decoded_ids[:, 0, 1:] + top_scores = scores[:, 0] + + return {'outputs': top_decoded_ids, 'scores': top_scores} + + +class LayerNormalization(tf.layers.Layer): + """Applies layer normalization.""" + + def __init__(self, hidden_size): + super(LayerNormalization, self).__init__() + self.hidden_size = hidden_size + + def build(self, _): + self.scale = tf.get_variable( + 'layer_norm_scale', + [self.hidden_size], + initializer = tf.ones_initializer(), + ) + self.bias = tf.get_variable( + 'layer_norm_bias', + [self.hidden_size], + initializer = tf.zeros_initializer(), + ) + self.built = True + + def call(self, x, epsilon = 1e-6): + mean = tf.reduce_mean(x, axis = [-1], keepdims = True) + variance = tf.reduce_mean( + tf.square(x - mean), axis = [-1], keepdims = True + ) + norm_x = (x - mean) * tf.rsqrt(variance + epsilon) + return norm_x * self.scale + self.bias + + +class PrePostProcessingWrapper(object): + """Wrapper class that applies layer pre-processing and post-processing.""" + + def __init__(self, layer, params, train): + self.layer = layer + self.postprocess_dropout = params['layer_postprocess_dropout'] + self.train = train + + # Create normalization layer + self.layer_norm = LayerNormalization(params['hidden_size']) + + def __call__(self, x, *args, **kwargs): + # Preprocessing: apply layer normalization + y = self.layer_norm(x) + + # Get layer output + y = self.layer(y, *args, **kwargs) + + # Postprocessing: apply dropout and residual connection + if self.train: + y = tf.nn.dropout(y, 1 - self.postprocess_dropout) + return x + y + + +class EncoderStack(tf.layers.Layer): + """Transformer encoder stack. + + The encoder stack is made up of N identical layers. Each layer is composed + of the sublayers: + 1. Self-attention layer + 2. Feedforward network (which is 2 fully-connected layers) + """ + + def __init__(self, params, train): + super(EncoderStack, self).__init__() + self.layers = [] + for _ in range(params['num_hidden_layers']): + # Create sublayers for each layer. + self_attention_layer = attention_layer.SelfAttention( + params['hidden_size'], + params['num_heads'], + params['attention_dropout'], + train, + ) + feed_forward_network = ffn_layer.FeedFowardNetwork( + params['hidden_size'], + params['filter_size'], + params['relu_dropout'], + train, + params['allow_ffn_pad'], + ) + + self.layers.append( + [ + PrePostProcessingWrapper( + self_attention_layer, params, train + ), + PrePostProcessingWrapper( + feed_forward_network, params, train + ), + ] + ) + + # Create final layer normalization layer. + self.output_normalization = LayerNormalization(params['hidden_size']) + + def call(self, encoder_inputs, attention_bias, inputs_padding): + """Return the output of the encoder layer stacks. + + Args: + encoder_inputs: tensor with shape [batch_size, input_length, hidden_size] + attention_bias: bias for the encoder self-attention layer. + [batch_size, 1, 1, input_length] + inputs_padding: P + + Returns: + Output of encoder layer stack. + float32 tensor with shape [batch_size, input_length, hidden_size] + """ + for n, layer in enumerate(self.layers): + # Run inputs through the sublayers. + self_attention_layer = layer[0] + feed_forward_network = layer[1] + + with tf.variable_scope('layer_%d' % n): + with tf.variable_scope('self_attention'): + encoder_inputs = self_attention_layer( + encoder_inputs, attention_bias + ) + with tf.variable_scope('ffn'): + encoder_inputs = feed_forward_network( + encoder_inputs, inputs_padding + ) + + return self.output_normalization(encoder_inputs) + + +class DecoderStack(tf.layers.Layer): + """Transformer decoder stack. + + Like the encoder stack, the decoder stack is made up of N identical layers. + Each layer is composed of the sublayers: + 1. Self-attention layer + 2. Multi-headed attention layer combining encoder outputs with results from + the previous self-attention layer. + 3. Feedforward network (2 fully-connected layers) + """ + + def __init__(self, params, train): + super(DecoderStack, self).__init__() + self.layers = [] + for _ in range(params['num_hidden_layers']): + self_attention_layer = attention_layer.SelfAttention( + params['hidden_size'], + params['num_heads'], + params['attention_dropout'], + train, + ) + enc_dec_attention_layer = attention_layer.Attention( + params['hidden_size'], + params['num_heads'], + params['attention_dropout'], + train, + ) + feed_forward_network = ffn_layer.FeedFowardNetwork( + params['hidden_size'], + params['filter_size'], + params['relu_dropout'], + train, + params['allow_ffn_pad'], + ) + + self.layers.append( + [ + PrePostProcessingWrapper( + self_attention_layer, params, train + ), + PrePostProcessingWrapper( + enc_dec_attention_layer, params, train + ), + PrePostProcessingWrapper( + feed_forward_network, params, train + ), + ] + ) + + self.output_normalization = LayerNormalization(params['hidden_size']) + + def call( + self, + decoder_inputs, + encoder_outputs, + decoder_self_attention_bias, + attention_bias, + cache = None, + ): + """Return the output of the decoder layer stacks. + + Args: + decoder_inputs: tensor with shape [batch_size, target_length, hidden_size] + encoder_outputs: tensor with shape [batch_size, input_length, hidden_size] + decoder_self_attention_bias: bias for decoder self-attention layer. + [1, 1, target_len, target_length] + attention_bias: bias for encoder-decoder attention layer. + [batch_size, 1, 1, input_length] + cache: (Used for fast decoding) A nested dictionary storing previous + decoder self-attention values. The items are: + {layer_n: {"k": tensor with shape [batch_size, i, key_channels], + "v": tensor with shape [batch_size, i, value_channels]}, + ...} + + Returns: + Output of decoder layer stack. + float32 tensor with shape [batch_size, target_length, hidden_size] + """ + for n, layer in enumerate(self.layers): + self_attention_layer = layer[0] + enc_dec_attention_layer = layer[1] + feed_forward_network = layer[2] + + # Run inputs through the sublayers. + layer_name = 'layer_%d' % n + layer_cache = cache[layer_name] if cache is not None else None + with tf.variable_scope(layer_name): + with tf.variable_scope('self_attention'): + decoder_inputs = self_attention_layer( + decoder_inputs, + decoder_self_attention_bias, + cache = layer_cache, + ) + with tf.variable_scope('encdec_attention'): + decoder_inputs = enc_dec_attention_layer( + decoder_inputs, encoder_outputs, attention_bias + ) + with tf.variable_scope('ffn'): + decoder_inputs = feed_forward_network(decoder_inputs) + + return self.output_normalization(decoder_inputs) diff --git a/neural-machine-translation/transformer/utils.py b/neural-machine-translation/transformer/utils.py new file mode 100644 index 0000000..216e211 --- /dev/null +++ b/neural-machine-translation/transformer/utils.py @@ -0,0 +1,524 @@ +# Copyright 2018 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the 'License'); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an 'AS IS' BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== +"""Functions for calculating loss, accuracy, and other model metrics. + +Metrics: + - Padded loss, accuracy, and negative log perplexity. Source: + https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/metrics.py + - BLEU approximation. Source: + https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/bleu_hook.py + - ROUGE score. Source: + https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/rouge.py +""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import collections +import math + +import numpy as np +import six +from six.moves import xrange # pylint: disable=redefined-builtin +import tensorflow as tf + + +def _pad_tensors_to_same_length(x, y): + """Pad x and y so that the results have the same length (second dimension).""" + with tf.name_scope('pad_to_same_length'): + x_length = tf.shape(x)[1] + y_length = tf.shape(y)[1] + + max_length = tf.maximum(x_length, y_length) + + x = tf.pad(x, [[0, 0], [0, max_length - x_length], [0, 0]]) + y = tf.pad(y, [[0, 0], [0, max_length - y_length]]) + return x, y + + +def padded_cross_entropy_loss(logits, labels, smoothing, vocab_size): + """Calculate cross entropy loss while ignoring padding. + + Args: + logits: Tensor of size [batch_size, length_logits, vocab_size] + labels: Tensor of size [batch_size, length_labels] + smoothing: Label smoothing constant, used to determine the on and off values + vocab_size: int size of the vocabulary + Returns: + Returns the cross entropy loss and weight tensors: float32 tensors with + shape [batch_size, max(length_logits, length_labels)] + """ + with tf.name_scope('loss', values = [logits, labels]): + logits, labels = _pad_tensors_to_same_length(logits, labels) + + # Calculate smoothing cross entropy + with tf.name_scope( + 'smoothing_cross_entropy', values = [logits, labels] + ): + confidence = 1.0 - smoothing + low_confidence = (1.0 - confidence) / tf.to_float(vocab_size - 1) + soft_targets = tf.one_hot( + tf.cast(labels, tf.int32), + depth = vocab_size, + on_value = confidence, + off_value = low_confidence, + ) + xentropy = tf.nn.softmax_cross_entropy_with_logits_v2( + logits = logits, labels = soft_targets + ) + + # Calculate the best (lowest) possible value of cross entropy, and + # subtract from the cross entropy loss. + normalizing_constant = -( + confidence * tf.log(confidence) + + tf.to_float(vocab_size - 1) + * low_confidence + * tf.log(low_confidence + 1e-20) + ) + xentropy -= normalizing_constant + + weights = tf.to_float(tf.not_equal(labels, 0)) + return xentropy * weights, weights + + +def _convert_to_eval_metric(metric_fn): + """Wrap a metric fn that returns scores and weights as an eval metric fn. + + The input metric_fn returns values for the current batch. The wrapper + aggregates the return values collected over all of the batches evaluated. + + Args: + metric_fn: function that returns scores and weights for the current batch's + logits and predicted labels. + + Returns: + function that aggregates the scores and weights from metric_fn. + """ + + def problem_metric_fn(*args): + """Returns an aggregation of the metric_fn's returned values.""" + (scores, weights) = metric_fn(*args) + + # The tf.metrics.mean function assures correct aggregation. + return tf.metrics.mean(scores, weights) + + return problem_metric_fn + + +def get_eval_metrics(logits, labels, params): + """Return dictionary of model evaluation metrics.""" + metrics = { + 'accuracy': _convert_to_eval_metric(padded_accuracy)(logits, labels), + 'accuracy_top5': _convert_to_eval_metric(padded_accuracy_top5)( + logits, labels + ), + 'accuracy_per_sequence': _convert_to_eval_metric( + padded_sequence_accuracy + )(logits, labels), + 'neg_log_perplexity': _convert_to_eval_metric( + padded_neg_log_perplexity + )(logits, labels, params['vocab_size']), + } + + if not params['use_tpu']: + # TPU does not support tf.py_func + metrics.update( + { + 'approx_bleu_score': _convert_to_eval_metric(bleu_score)( + logits, labels + ), + 'rouge_2_fscore': _convert_to_eval_metric(rouge_2_fscore)( + logits, labels + ), + 'rouge_L_fscore': _convert_to_eval_metric(rouge_l_fscore)( + logits, labels + ), + } + ) + + # Prefix each of the metric names with "metrics/". This allows the metric + # graphs to display under the "metrics" category in TensorBoard. + metrics = {'metrics/%s' % k: v for k, v in six.iteritems(metrics)} + return metrics + + +def padded_accuracy(logits, labels): + """Percentage of times that predictions matches labels on non-0s.""" + with tf.variable_scope('padded_accuracy', values = [logits, labels]): + logits, labels = _pad_tensors_to_same_length(logits, labels) + weights = tf.to_float(tf.not_equal(labels, 0)) + outputs = tf.to_int32(tf.argmax(logits, axis = -1)) + padded_labels = tf.to_int32(labels) + return tf.to_float(tf.equal(outputs, padded_labels)), weights + + +def padded_accuracy_topk(logits, labels, k): + """Percentage of times that top-k predictions matches labels on non-0s.""" + with tf.variable_scope('padded_accuracy_topk', values = [logits, labels]): + logits, labels = _pad_tensors_to_same_length(logits, labels) + weights = tf.to_float(tf.not_equal(labels, 0)) + effective_k = tf.minimum(k, tf.shape(logits)[-1]) + _, outputs = tf.nn.top_k(logits, k = effective_k) + outputs = tf.to_int32(outputs) + padded_labels = tf.to_int32(labels) + padded_labels = tf.expand_dims(padded_labels, axis = -1) + padded_labels += tf.zeros_like(outputs) # Pad to same shape. + same = tf.to_float(tf.equal(outputs, padded_labels)) + same_topk = tf.reduce_sum(same, axis = -1) + return same_topk, weights + + +def padded_accuracy_top5(logits, labels): + return padded_accuracy_topk(logits, labels, 5) + + +def padded_sequence_accuracy(logits, labels): + """Percentage of times that predictions matches labels everywhere (non-0).""" + with tf.variable_scope( + 'padded_sequence_accuracy', values = [logits, labels] + ): + logits, labels = _pad_tensors_to_same_length(logits, labels) + weights = tf.to_float(tf.not_equal(labels, 0)) + outputs = tf.to_int32(tf.argmax(logits, axis = -1)) + padded_labels = tf.to_int32(labels) + not_correct = ( + tf.to_float(tf.not_equal(outputs, padded_labels)) * weights + ) + axis = list(range(1, len(outputs.get_shape()))) + correct_seq = 1.0 - tf.minimum( + 1.0, tf.reduce_sum(not_correct, axis = axis) + ) + return correct_seq, tf.constant(1.0) + + +def padded_neg_log_perplexity(logits, labels, vocab_size): + """Average log-perplexity excluding padding 0s. No smoothing.""" + num, den = padded_cross_entropy_loss(logits, labels, 0, vocab_size) + return -num, den + + +def bleu_score(logits, labels): + """Approximate BLEU score computation between labels and predictions. + + An approximate BLEU scoring method since we do not glue word pieces or + decode the ids and tokenize the output. By default, we use ngram order of 4 + and use brevity penalty. Also, this does not have beam search. + + Args: + logits: Tensor of size [batch_size, length_logits, vocab_size] + labels: Tensor of size [batch-size, length_labels] + + Returns: + bleu: int, approx bleu score + """ + predictions = tf.to_int32(tf.argmax(logits, axis = -1)) + # TODO: Look into removing use of py_func + bleu = tf.py_func(compute_bleu, (labels, predictions), tf.float32) + return bleu, tf.constant(1.0) + + +def _get_ngrams_with_counter(segment, max_order): + """Extracts all n-grams up to a given maximum order from an input segment. + + Args: + segment: text segment from which n-grams will be extracted. + max_order: maximum length in tokens of the n-grams returned by this + methods. + + Returns: + The Counter containing all n-grams upto max_order in segment + with a count of how many times each n-gram occurred. + """ + ngram_counts = collections.Counter() + for order in xrange(1, max_order + 1): + for i in xrange(0, len(segment) - order + 1): + ngram = tuple(segment[i : i + order]) + ngram_counts[ngram] += 1 + return ngram_counts + + +def compute_bleu( + reference_corpus, translation_corpus, max_order = 4, use_bp = True +): + """Computes BLEU score of translated segments against one or more references. + + Args: + reference_corpus: list of references for each translation. Each + reference should be tokenized into a list of tokens. + translation_corpus: list of translations to score. Each translation + should be tokenized into a list of tokens. + max_order: Maximum n-gram order to use when computing BLEU score. + use_bp: boolean, whether to apply brevity penalty. + + Returns: + BLEU score. + """ + reference_length = 0 + translation_length = 0 + bp = 1.0 + geo_mean = 0 + + matches_by_order = [0] * max_order + possible_matches_by_order = [0] * max_order + precisions = [] + + for (references, translations) in zip(reference_corpus, translation_corpus): + reference_length += len(references) + translation_length += len(translations) + ref_ngram_counts = _get_ngrams_with_counter(references, max_order) + translation_ngram_counts = _get_ngrams_with_counter( + translations, max_order + ) + + overlap = dict( + (ngram, min(count, translation_ngram_counts[ngram])) + for ngram, count in ref_ngram_counts.items() + ) + + for ngram in overlap: + matches_by_order[len(ngram) - 1] += overlap[ngram] + for ngram in translation_ngram_counts: + possible_matches_by_order[ + len(ngram) - 1 + ] += translation_ngram_counts[ngram] + + precisions = [0] * max_order + smooth = 1.0 + + for i in xrange(0, max_order): + if possible_matches_by_order[i] > 0: + precisions[i] = ( + float(matches_by_order[i]) / possible_matches_by_order[i] + ) + if matches_by_order[i] > 0: + precisions[i] = ( + float(matches_by_order[i]) / possible_matches_by_order[i] + ) + else: + smooth *= 2 + precisions[i] = 1.0 / (smooth * possible_matches_by_order[i]) + else: + precisions[i] = 0.0 + + if max(precisions) > 0: + p_log_sum = sum(math.log(p) for p in precisions if p) + geo_mean = math.exp(p_log_sum / max_order) + + if use_bp: + ratio = translation_length / reference_length + bp = math.exp(1 - 1.0 / ratio) if ratio < 1.0 else 1.0 + bleu = geo_mean * bp + return np.float32(bleu) + + +def rouge_2_fscore(logits, labels): + """ROUGE-2 F1 score computation between labels and predictions. + + This is an approximate ROUGE scoring method since we do not glue word pieces + or decode the ids and tokenize the output. + + Args: + logits: tensor, model predictions + labels: tensor, gold output. + + Returns: + rouge2_fscore: approx rouge-2 f1 score. + """ + predictions = tf.to_int32(tf.argmax(logits, axis = -1)) + # TODO: Look into removing use of py_func + rouge_2_f_score = tf.py_func(rouge_n, (predictions, labels), tf.float32) + return rouge_2_f_score, tf.constant(1.0) + + +def _get_ngrams(n, text): + """Calculates n-grams. + + Args: + n: which n-grams to calculate + text: An array of tokens + + Returns: + A set of n-grams + """ + ngram_set = set() + text_length = len(text) + max_index_ngram_start = text_length - n + for i in range(max_index_ngram_start + 1): + ngram_set.add(tuple(text[i : i + n])) + return ngram_set + + +def rouge_n(eval_sentences, ref_sentences, n = 2): + """Computes ROUGE-N f1 score of two text collections of sentences. + + Source: https://www.microsoft.com/en-us/research/publication/ + rouge-a-package-for-automatic-evaluation-of-summaries/ + + Args: + eval_sentences: Predicted sentences. + ref_sentences: Sentences from the reference set + n: Size of ngram. Defaults to 2. + + Returns: + f1 score for ROUGE-N + """ + f1_scores = [] + for eval_sentence, ref_sentence in zip(eval_sentences, ref_sentences): + eval_ngrams = _get_ngrams(n, eval_sentence) + ref_ngrams = _get_ngrams(n, ref_sentence) + ref_count = len(ref_ngrams) + eval_count = len(eval_ngrams) + + # Count the overlapping ngrams between evaluated and reference + overlapping_ngrams = eval_ngrams.intersection(ref_ngrams) + overlapping_count = len(overlapping_ngrams) + + # Handle edge case. This isn't mathematically correct, but it's good enough + if eval_count == 0: + precision = 0.0 + else: + precision = float(overlapping_count) / eval_count + if ref_count == 0: + recall = 0.0 + else: + recall = float(overlapping_count) / ref_count + f1_scores.append( + 2.0 * ((precision * recall) / (precision + recall + 1e-8)) + ) + + # return overlapping_count / reference_count + return np.mean(f1_scores, dtype = np.float32) + + +def rouge_l_fscore(predictions, labels): + """ROUGE scores computation between labels and predictions. + + This is an approximate ROUGE scoring method since we do not glue word pieces + or decode the ids and tokenize the output. + + Args: + predictions: tensor, model predictions + labels: tensor, gold output. + + Returns: + rouge_l_fscore: approx rouge-l f1 score. + """ + outputs = tf.to_int32(tf.argmax(predictions, axis = -1)) + rouge_l_f_score = tf.py_func( + rouge_l_sentence_level, (outputs, labels), tf.float32 + ) + return rouge_l_f_score, tf.constant(1.0) + + +def rouge_l_sentence_level(eval_sentences, ref_sentences): + """Computes ROUGE-L (sentence level) of two collections of sentences. + + Source: https://www.microsoft.com/en-us/research/publication/ + rouge-a-package-for-automatic-evaluation-of-summaries/ + + Calculated according to: + R_lcs = LCS(X,Y)/m + P_lcs = LCS(X,Y)/n + F_lcs = ((1 + beta^2)*R_lcs*P_lcs) / (R_lcs + (beta^2) * P_lcs) + + where: + X = reference summary + Y = Candidate summary + m = length of reference summary + n = length of candidate summary + + Args: + eval_sentences: The sentences that have been picked by the summarizer + ref_sentences: The sentences from the reference set + + Returns: + A float: F_lcs + """ + + f1_scores = [] + for eval_sentence, ref_sentence in zip(eval_sentences, ref_sentences): + m = float(len(ref_sentence)) + n = float(len(eval_sentence)) + lcs = _len_lcs(eval_sentence, ref_sentence) + f1_scores.append(_f_lcs(lcs, m, n)) + return np.mean(f1_scores, dtype = np.float32) + + +def _len_lcs(x, y): + """Returns the length of the Longest Common Subsequence between two seqs. + + Source: http://www.algorithmist.com/index.php/Longest_Common_Subsequence + + Args: + x: sequence of words + y: sequence of words + + Returns + integer: Length of LCS between x and y + """ + table = _lcs(x, y) + n, m = len(x), len(y) + return table[n, m] + + +def _lcs(x, y): + """Computes the length of the LCS between two seqs. + + The implementation below uses a DP programming algorithm and runs + in O(nm) time where n = len(x) and m = len(y). + Source: http://www.algorithmist.com/index.php/Longest_Common_Subsequence + + Args: + x: collection of words + y: collection of words + + Returns: + Table of dictionary of coord and len lcs + """ + n, m = len(x), len(y) + table = dict() + for i in range(n + 1): + for j in range(m + 1): + if i == 0 or j == 0: + table[i, j] = 0 + elif x[i - 1] == y[j - 1]: + table[i, j] = table[i - 1, j - 1] + 1 + else: + table[i, j] = max(table[i - 1, j], table[i, j - 1]) + return table + + +def _f_lcs(llcs, m, n): + """Computes the LCS-based F-measure score. + + Source: http://research.microsoft.com/en-us/um/people/cyl/download/papers/ + rouge-working-note-v1.3.1.pdf + + Args: + llcs: Length of LCS + m: number of words in reference summary + n: number of words in candidate summary + + Returns: + Float. LCS-based F-measure score + """ + r_lcs = llcs / m + p_lcs = llcs / n + beta = p_lcs / (r_lcs + 1e-12) + num = (1 + (beta ** 2)) * r_lcs * p_lcs + denom = r_lcs + ((beta ** 2) * p_lcs) + f_lcs = num / (denom + 1e-12) + return f_lcs diff --git a/neural-machine-translation/util.py b/neural-machine-translation/util.py deleted file mode 100644 index ce35290..0000000 --- a/neural-machine-translation/util.py +++ /dev/null @@ -1,45 +0,0 @@ -# Copyright 2017 Google Inc. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""DNC util ops and modules.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import tensorflow as tf - - -def batch_invert_permutation(permutations): - """Returns batched `tf.invert_permutation` for every row in `permutations`.""" - with tf.name_scope('batch_invert_permutation', values=[permutations]): - unpacked = tf.unstack(permutations) - inverses = [tf.invert_permutation(permutation) for permutation in unpacked] - return tf.stack(inverses) - - -def batch_gather(values, indices): - """Returns batched `tf.gather` for every row in the input.""" - with tf.name_scope('batch_gather', values=[values, indices]): - unpacked = zip(tf.unstack(values), tf.unstack(indices)) - result = [tf.gather(value, index) for value, index in unpacked] - return tf.stack(result) - - -def one_hot(length, index): - """Return an nd array of given `length` filled with 0s and a 1 at `index`.""" - result = np.zeros(length) - result[index] = 1 - return result diff --git a/neural-machine-translation/vietnam-train b/neural-machine-translation/vietnam-train deleted file mode 100644 index e0775e4..0000000 --- a/neural-machine-translation/vietnam-train +++ /dev/null @@ -1,500 +0,0 @@ -Khoa học đằng sau một tiêu đề về khí hậu -Trong 4 phút , chuyên gia hoá học khí quyển Rachel Pike giới thiệu sơ lược về những nỗ lực khoa học miệt mài đằng sau những tiêu đề táo bạo về biến đổi khí hậu , cùng với đoàn nghiên cứu của mình -- hàng ngàn người đã cống hiến cho dự án này -- một chuyến bay mạo hiểm qua rừng già để tìm kiếm thông tin về một phân tử then chốt . -Tôi muốn cho các bạn biết về sự to lớn của những nỗ lực khoa học đã góp phần làm nên các dòng tít bạn thường thấy trên báo . -Có những dòng trông như thế này khi bàn về biến đổi khí hậu , và như thế này khi nói về chất lượng không khí hay khói bụi . -Cả hai đều là một nhánh của cùng một lĩnh vực trong ngành khoa học khí quyển . -Các tiêu đề gần đây trông như thế này khi Ban Điều hành Biến đổi khí hậu Liên chính phủ , gọi tắt là IPCC đưa ra bài nghiên cứu của họ về hệ thống khí quyển . -Nghiên cứu được viết bởi 620 nhà khoa học từ 40 quốc gia khác nhau . -Họ viết gần 1000 trang về chủ đề này . -Và tất cả các trang đều được xem xét bởi 400 khoa học gia và nhà phê bình khác từ 113 quốc gia . -Đó là cả một cộng đồng lớn , lớn đến nỗi trên thực tế cuộc tụ hội hằng năm của chúng tôi là hội nghị khoa học [ tự nhiên ] lớn nhất thế giới . -Mỗi năm , hơn 15,000 nhà khoa học đến San Francisco để tham dự hội nghị này . -Mỗi một khoa học gia đều thuộc một nhóm nghiên cứu , và mỗi nhóm đều nghiên cứu rất nhiều đề tài đa dạng . -Với chúng tôi , tại Cambridge , các đề tài thay đổi từ sự dao động của El Niño , vốn có tác động đến thời tiết và khí hậu , sự đồng hoá thông tin từ vệ tinh , khí thải từ những cánh đồng nhiên liệu sinh học , tình cờ lại là đề tài tôi nghiên cứu . -Mỗi lĩnh vực nghiên cứu lại chia ra những lĩnh vực nhỏ hơn , và những nghiên cứu sinh có bằng tiến sĩ , như tôi , phải nghiên cứu những đề tài vô cùng cụ thể , cụ thể như chỉ vài quy trình hay vài phân tử . -Một trong số những phân tử tôi nghiên cứu tên là isoprene . Đây . Nó là một phân tử hữu cơ nhỏ . Có thể các bạn cũng chưa từng nghe tên . -Trọng lượng của một chiếc kẹp giấy vào khoảng 900 zeta-illion -- 10 mũ 21 -- phân tử isoprene . -Dù trọng lượng phân tử rất nhỏ , thế nhưng lượng isoprene được thải vào khí quyển hàng năm ngang ngửa với tổng trọng lượng của dân số toàn cầu . -Đó là một lượng khí thải khổng lồ , bằng tổng trọng lượng của mêtan . -Chính vì lượng khí thải rất lớn , nó có ý nghĩa quan trọng với hệ thống khí quyển . -Chính vì nó có ý nghĩa quan trọng với hệ thống khí quyển , giá nào chúng tôi cũng theo đuổi nghiên cứu này đến cùng . -Chúng tôi cho nó nổ và xem xét từng mảnh nhỏ . -Đây là Phòng nghiên cứu khói bụi EUPHORE ở Tây Ban Nha . -Nổ trong không khí hay cháy hoàn toàn diễn ra chậm hơn 15,000 lần so với những phản ứng trong động cơ xe . -Dù vậy , chúng tôi vẫn xem xét từng mảnh nhỏ . -Chúng tôi chạy những mô hình khổng lồ trên siêu máy tính ; đây là công việc của tôi . -Mô hình của chúng tôi gồm hàng trăm ngàn thùng xếp chồng tính toán với hàng trăm biến số trong thời gian cực ngắn . -Mà vẫn cần hàng tuần mới thực hiện xong các phép tích phân . -Chúng tôi cần làm hàng tá phép tính như thế để hiểu được những gì đang xảy ra . -Chúng tôi còn bay khắp thế giới để tìm phân tử này . -Gần đây tôi tham gia một cuộc khảo sát thực địa ở Malaysia . Còn nhiều chuyến khác nữa . -Chúng tôi tìm thấy một tháp canh khí hậu toàn cầu ở đó , ngay giữa rừng sâu , và chúng tôi treo các thiết bị nghiên cứu trị giá hàng trăm ngàn đô la xa khỏi cái tháp để tìm isoprene , và tất nhiên là những thứ khác nữa trong suốt thời gian ở đó . -Đây chính là cái tháp giữa rừng sâu , nhìn từ trên cao . -Và từ dưới đất . -Có giai đoạn chúng tôi còn mang cả máy bay theo . -Chiếc phi cơ này , mẫu BA146 do FAAM sở hữu thông thường có thể chở từ 120-130 người . -Rất có thể bạn đã ở trên một chiếc tương tự khi đến đây hôm nay . -Chúng tôi không chỉ bay . Chúng tôi bay cách tầng vòm của rừng 100 mét để đo đạc phân tử này -- chuyện vô cùng nguy hiểm . -Chúng tôi phải bay với độ nghiêng đặc biệt để thực hiện các phép đo . -Phải thuê quân đội và sát hạch phi cơ để điều khiển máy bay . -Phải xin lệnh đặc biệt cho phép bay . -Khi bay quanh những bờ sông ở thung lũng , các lực tác động có thể lên tới 2G . -Các nhà khoa học phải được thắt chặt hoàn toàn vào ghế để có thể thực hiện đo đạc trên máy bay . -Vì vậy , bạn có thể hình dung , bên trong đó hoàn toàn không giống với bất kỳ chiếc máy bay du lịch nào khác . -Đó là cả một phòng thí nghiệm di động để giúp chúng tôi thực hiện các phép đo . -Chúng tôi làm tất cả chỉ để tìm hiểu tính chất hoá học của một phân tử . -Khi nghiên cứu sinh như tôi có sở thích hay hiểu biết về phân tử đó , đại loại như thế , họ sẽ viết cả một bài nghiên cứu khoa học về đề tài đó . -Và ngoài cuộc khảo sát đó chúng tôi sẽ còn hàng tá bài nghiên cứu về hàng tá các quy trình hay phân tử . -Khi một phần kiến thức dần định hình , nó sẽ tạo thành một tiểu mục , hay một tiểu-tiểu mục trong một bản kiểm định như ở IPCC , mặc dù còn có nhiều bài khác . -Mỗi một chương trong 11 chương của IPCC có từ 6 đến 10 tiểu mục như thế . -Nói như thế để bạn hình dung được quy mô của những nỗ lực này . -Trong mỗi bản đánh giá chúng tôi viết , chúng tôi luôn đính kèm một bản tóm lược , được viết cho những độc giả không chuyên về khoa học . -Chúng tôi đưa bản tóm lược cho các nhà báo và nhà chính sách để có được những dòng tít như thế này . -Cám ơn rất nhiều . -Christopher deCharms quét não bộ theo thời gian thực -Nhà thần kinh học và nhà sáng chế Christopher deCharms cho thấy cách sử dụng fMRI để ghi lại những hoạt động của não bộ -- suy nghĩ , cảm xúc , đau đớn-- ngay khi chúng xảy ra . Nói cách khác , bạn thực sự nhìn thấy những gì bạn cảm nhận . -Xin chào . Tôi đề nghị các bạn giơ tay và vẫy về phía sau như tôi làm đây -- như cách vẫy hoàng gia . -Bắt chước những gì bạn nhìn thấy . -Bạn có thể lập trình cho hàng trăm cơ bắp trong cánh tay . -Bạn sẽ sớm nhìn được vào bên trong não bộ , điều khiển hàng trăm vùng trên não mà bạn thấy đó . -Tôi sẽ cho bạn biết về công nghệ đó . -Con người muốn nhìn được vào bên trong ý nghĩ , não người , qua hàng ngàn năm nay , -Vâng , bước ra khỏi phòng thì nghiệm có vẻ như là cách thế hệ chúng ta giải quyết việc đó . -Con người tưởng tượng điều này sẽ rất khó . -Bạn sẽ phải thu nhỏ một chiếc phi thuyền , đưa vào trong mạch máu . -Điều này thực sự nguy hiểm . -bạn có thể bị các bạch cầu tấn công trong động mạch . -Nhưng bây giờ , chúng ta có một công nghệ thực để làm việc này . -Chúng ta sẽ bay vào trong não của đồng nghiệp của tôi , Peter . -Chúng ta sẽ làm điều này mà không xâm nhập bên trong , sử dụng MRI . -Không cần bơm thứ gì vào . Không cần đến tia bức xạ . -Chúng ta sẽ có thể bay vào hộp sọ của não Peter -- theo nghĩa đen là bay vào cơ thể anh ta -- nhưng quan trọng hơn , ta có thể nhìn vào tâm trí của anh ta . -Khi Peter di chuyển cánh tay , chấm màu vàng mà bạn thấy ở đó là bề mặt mà ý thức của Peter hoạt động -Trước đó bạn đã thấy là với các điện cực ta có thể điều khiển cánh tay robot , và ảnh chụp não bộ và máy quét có thể cho thấy bên trong của não . -Cái mới chính là quá trình đã từng chiếm nhiều ngày hoặc nhiều tháng để phân tích . -Chúng ta đã đạt được điều này qua công nghệ đến mức mili-giây. và điều này cho phép Peter nhìn thấy não bộ anh ấy dưới thời gian thực khi anh ta ở trong máy quét . -Anh ta có thể thấy 65000 điểm kích hoạt trên 1 giây . -Nếu thấy được mẫu này của não bộ , anh ta có thể học cách điều khiển nó . -Có ba cách để làm ảnh hưởng đến não : giường của nhà trị liệu học , thuốc viên và con dao . -Đây là lụa chọn thứ tư mà bạn sẽ nhanh chóng có . -Ta đều biết rằng khi suy nghĩ , ta đã tạo ra những kênh nằm sâu trong tâm trí và trong não . -Đau kinh niên là một ví dụ . Nếu bạn phỏng , bạn sẽ giật tay ra xa . -Nhưng nếu trong sáu tháng , hay sáu năm , cơn đau vẫn không dứt , đó là vì những vòng tuần hoàn này đang sản xuất ra cơn đau chống lại bạn . -Nếu ta có thể nhìn vào các xung kích hoạt của não sản xuất ra cơn đau , ta có thể lập ra các mô hình 3 chiều và nhìn thấy các thông tin quá trình của não theo thời gian thực , chúng ta có thể lựa chọn vùng sản xuất cơn đau . -Vâng , hãy giơ tay lên và cong cơ cánh tay lại . -Bây giờ hãy tưởng tượng bạn sớm được nhìn vào bên trong não mình và được chọn vùng trên não để làm cùng một việc đó . -Và điều bạn thấy là , Chúng ta đã chọn đường đi trong não bộ của một bệnh nhân đau kinh niên . -Điều này có thể làm bạn shock , nhưng thực sự chúng ta đã đọc được não bộ của người này theo thời gian thực . -Họ đang theo dõi hoạt động não bộ của chính họ , và điều khiển con đường gây nên cơn đau . -Họ tập co dãn hệ thống tiết ra chất xoa dịu từ bên trong . -Khi họ làm vậy , ở góc trên bên trái ta thấy được thứ đã kết nối kích hoạt não bộ của cơn đau đang được điều khiển của họ . -Khi họ điều khiển não bộ , họ điều khiển cơn đau của mình . -Đây là một phương pháp thử nghiệm , nhưng trong các phép thử chúng ta đã thấy lượng bệnh nhân đau kinh niên giảm 44-64 % . -Đây không phải là " Ma trận . " Bạn chỉ có thể làm điều này với chính bản thân mình . Bạn tự làm chủ . -Tôi đã nhìn vào trong não mình . Bạn cũng sớm làm như vậy thôi . -Khi bạn làm được , bạn muốn điều khiển cái gì ? -Bạn có thể nhìn vào mọi phương diện khiến bạn là chính mình , tất cả những kí ức . -Đây là một trong số những vấn đề chúng ta nói đến hôm nay mà tôi không có thời gian để đi vào chi tiết . -Nhưng tôi có một câu hỏi lớn dành cho bạn . -Chúng ta là thế hệ đầu tiên có thể bước vào , sử dụng công nghệ này , trí tuệ con người và não bộ , -Chúng ta sẽ đưa nó đến đâu ? -Beeban Kidron : Đìều kì diệu của điện ảnh -Những bộ phim có sức mạnh tạo nên những kinh nghiệm tường thuật được chia sẻ và định hình những kí ức và những cách nhìn thế giới . Nhà đạo diễn phim người Anh - Beeban Kidron dẫn chứng bằng những cảnh phim hình tượng -- từ " Phép màu ở Milan " đến " Những câu bé và khu dân cư " -- khi bà kể cách mà nhóm FILMCLUB của bà chia sẻ những thước phim vĩ đại với lũ trẻ . -Bằng chứng cho thấy rằng con người ở mọi lứa tuổi và từ mọi nền văn hoá tạo ra danh tính của họ theo một dạng tường thuật nào đó . -Từ mẹ đến con gái , người thuyết giáo đến người theo hội , giáo viên đến học sinh , người kể chuyện đến khán thính giả . -Bất kể ở các hình vẽ trong hang động hay các cách dùng mới nhất của Internet , con người luôn kể câu chuyện lịch sử và về những sự thật qua dụ ngôn và chuyện tưởng tượng -Chúng ta là những người kể chuyện bản năng . -Nhưng ở đâu , trong thế giới già nhanh chóng già cỗi và chia nhỏ của chúng ta , chúng ta trao tặng những kinh nghiệm mang tính cộng đồng , không qua trung gian bởi quyền lợi tiêu dùng kịch liệt của chính chúng ta ? -Và chuyện tường thuật nào , lịch sử nào , bản sắc nào , qui tắc đạo đức nào mà chúng ta đang truyền đạt lại cho thế hệ trẻ của chúng ta ? -Điện ảnh đáng được tranh cãi là dạng nghệ thuật ảnh hưởng nhất trong thế kỉ 20 . -Nó là những câu chuyện kể của các nghệ sĩ vượt qua các ranh giới quốc gia , dưới vô vàn ngôn ngữ , thể loại và triết lý mà một người có thể tưởng tượng ra được . -Thực sự là , thật khó để tìm một chủ đề mà điện ảnh chưa động đến . -Trong suốt thập kỉ qua chúng ta đang chứng kiến sự hội nhập rộng lớn của phương tiện truyền thông toàn cầu , giờ bị thống trị bởi văn hoá phim bom tấn Hollywood . -Chúng ta đang được phục vụ một chế độ " ăn kiêng " mà sự cảm giác là chủ chốt , chứ không phải nội dung . -Điều gì đã quen thuộc với tất cả chúng ta 40 năm trước -- việc kể các câu chuyện giữa các thế hệ-- bây giờ rất hiếm hoi . -Là một nhà làm phim , điều đó làm tôi lo ngại . -Là một con người , nó reo sự sợ hãi của Chúa vào tôi . -Tương lai nào những con người trẻ có thể xây dựng với những nắm bắt quá nhỏ bé về nơi họ sinh ra và quá ít những câu chuyện tường thuật về chuyện gì là có thể ? -Thật quá mỉa mai ; cơ hội nắm bắt kĩ thuật chưa bao giờ lớn hơn thế , cơ hội nắm bắt văn hoá chưa bao giờ yếu hơn thế . -Và vì vậy vào năm 2006 chúng tôi lập FILMCLUB , một tổ chức định kì hàng tuần chiếu phim trong các trường học và sau đó là các cuộc thảo luận . -Nếu chúng ta có thể tra soát biên niên sử 100 năm của phim , có lẽ chúng ta có thể xây dựng một chuyện tường thuật mang ý nghĩa đến thế giới phân mảnh và không ngừng nghỉ của thế hệ trẻ . -Được tiếp xúc với công nghệ , ngay cả trường học ở một thôn ngoại thành nhỏ bé có thể chiếu một DVD lên một bảng trắng . -Trong 9 tháng đầu tiên chúng tôi cho chạy 25 câu lạc bộ dọc nước Anh , cho những nhóm trẻ em từ 5 đến 18 tuổi xem một bộ phim không bị ngắt quãng trong 90 phút . -Những bộ phim được biên đạo và bối cảnh hoá . -Nhưng sự lựa chọn thuộc về chúng , và khán thính giả của chúng tôi tăng lên nhanh chóng để chọn những món " ăn kiêng " giàu nhất và đa dạng nhất mà chúng tôi có thể cung cấp . -Có kết quả ngay lập tức . -Đó là cách giáo dục thâm tuý và có khả năng truyền tải nhất . -Một nhóm có tối đa 150 và tối thiểu 3 người , những bạn trẻ này khám phá những nơi mới , những suy nghĩ mới , những góc nhìn mới . -Ngay khi thử nghiệm kết thúc , chúng ta đã có tên của hàng ngàn trường học mong muốn được tham gia . -Bộ phim đã thay đổi cuộc đời của tôi là bộ phim năm 1951 của Vittorio De Sica , " Phép màu ở Milan " . -Đó là một lời nhận xét đáng chú ý trong những khu ổ chuột , nghèo đói và khát vọng . -Tôi đã xem bộ phim vào dịp sinh nhật lần thứ 50 của bố tôi . -Công nghệ lúc đó đã khiến chúng ta phải thuê một rạp để xem , tìm và trả cho việc in tráng và người chiếu phim . -Nhưng với cha của tôi , Sự quan trọng của cảm xúc và tính nghệ thuật trong cách nhìn của De Sica là rất lớn đến nỗi ông chọn nó để ăn mừng sinh nhật thứ 50 của mình với ba đứa con tuổi teen và 30 người bạn của chúng , " Để , " ông nói , truyền sự quan tâm và niềm hy vọng cho thế hệ tiếp theo . " -Trong cảnh cuối của " Phép màu ở Milan " những người trong khu ổ chuột đã nổi lên bầu trời trên những cây chổi bay . -Sáu mươi năm sau khi bộ phim được làm ra và 30 năm sau lần đầu tiên tôi xem nó , tôi thấy những gương mặt trẻ nghiêng lên trong sự kinh ngạc nỗi nghi ngờ của chúng hợp với nỗi nghi ngờ của tôi . -Và tốc độ mà chúng liên hệ nó với " Triệu phú khu ổ chuột " hay những khu phố " favela " ở Rio nói lên bản chất bền vững đó . -Trong mùa chiếu của Câu lạc bộ phim về dân chủ và chính quyền , chúng tôi đã chiếu " Ông Smith đến Washington . " -Được làm vào năm 1939 , bộ phim có tuổi già hơn tuổi của hầu hết ông bà của các thành viên -Sự cổ điển của Frank Capra có giá trị ở tính độc lập và sự thích nghi . -Bộ phim chỉ ra làm thế nào để làm đúng , làm thế nào để trở nên kì lạ phi thường . -Nó cũng là cách diễn tả về lòng tin coi bộ máy chính trị như nguồn gốc danh dự . -Không lâu sau đó " Ông Smith " trở thành bộ phim kinh điển của Câu lạc bộ phim , Có một tuần tất cả các buổi tối " cản trở lại các luật lệ " ở Toà nhà Nhà cầm quyền . -và thật vui vô cùng khi chúng tôi thấy những bạn trẻ trên khắp đất nước giải thích với nhà cầm quyền rằng cản trở các đạo luật là gì và tại sao các nhà cầm quyền có thể định giờ ngủ của họ theo một nguyên tắc nào đó . -Nói chung thi Jimmy Stewart đã cản trở các đạo luật trong toàn bộ 2 bộ phim cơ mà . -Bằng cách chọn " Khách sạn Rwanda " bọn trẻ đã khám phá về tôi diệt chủng ở dạng thú tính nhất . -Nó gây ra những giọt nước mắt và khơi gợi những câu hỏi thâm thuý về những đội quân bảo vệ hoà bình không vũ khí và sự lừa gạt của xã hội phương tây khi đối diện với cuộc đấu tranh đạo đức với những tiện nghi thực dụng trong đầu -Và khi " Bản danh sách của Schindler " khiến bọn trẻ không bao giờ quên , một đứa trẻ , với đầy sự đau đớn tỉnh táo , nhận xét rằng , " Chúng ta đã quên mất rồi , nếu không thì làm thế nào mà " Khách sạn Rwanda " lại xảy ra ? " -Khi bọn trẻ xem nhiều phim hơn , cuộc sống của chúng phong phú hơn . -" Kẻ móc túi " bắt đầu một cuộc tranh cãi về việc tước quyền công dân của tội phạm . -" Gửi ngài , với sự yêu mến " đốt cháy khán giả tuổi thành niên của bộ phim . -Chúng ăn mừng sự thay đổi về thái độ đối với những người Briton không phải da trắng , nhưng chửi rủa hệ thống trường học không ngơi nghỉ của họ không có giá trị bản sắc cộng đồng , không giống như sự giám hộ cẩn trọng của Sidney Potier mang lại . -giờ đây , những đứa trẻ sâu sắc , có chính kiến và tò mò này không nghĩ gì ngoài việc nắm lấy những bộ phim -- đen trắng , phụ đề , tài liệu , phi tường thuật hay tưởng tượng -- và không nghĩ gì về viết những bài nhân xét chi tiết tranh đua nói về những bộ phim yêu thích bằng những bài văn xuôi đam mê và càng ngày càng triết lý . -6000 bản nhận xét mỗi tuần ở từng trường ganh đua cho sự vinh dự được thành bài nhận xét của tuần . -Từ 25 câu lạc bộ , chúng tôi đã có hàng trăm , rồi hàng ngàn , cho đến khi chúng tôi có gần một phần tư triệu đứa trẻ trong 7,000 câu lạc bộ dọc đất nước . -Mặc dù những con số đó đã , và tiếp tục tăng một cách đáng kinh ngạc , điều đã trở nên kinh ngạc hơn nữa là làm thế nào sự trải nghiệm về những câu hỏi phê bình tò mò được chuyển tải vào cuộc sống . -Một vài đứa trẻ của chúng tôi đã bắt đầu nói chuyện với bố mẹ chúng , một số nói với giáo viên , hoặc bạn bè của chúng . -Và với những em không có bạn , bắt đầu kết bạn . -Những bộ phim đem lại sự liên kết ở tất cả những dạng bị chia cắt . -Và các câu chuyện , chúng đã giúp cung cấp những kinh nghiệm mang tính chia sẻ . -" Persepolis " mang một bé gái đến gần hơn với người mẹ Iran của mình và " Hàm cá mập " trở thành cách mà một câu bé nhỏ tuổi có thể nói lên nỗi sợ mà cậu đã trải qua về bạo lực trong một chuyến bay đã giết chết đầu tiên là bố rồi đến cả mẹ của cậu , mẹ cậu đã bị ném qua mạn tàu trong một chuyến đi tàu -Ai đã đúng , ai sai ? -Họ sẽ làm gì nếu bị đặt dưới tình trạng tương tự ? -Câu chuyện kể có hay không ? -Có thông điệp ẩn dấu nào trong đó ? -Làm thế nào thế giới thay đổi ? Làm thế nào nó có thể khác đi ? -Cơn bão các câu hỏi đã được bay tới tấp từ miệng của những đứa trẻ những người mà thế giới từng nghĩ sẽ chẳng quan tâm -Và chúng không tự biết rằng chúng quan tâm . -Và khi chúng viết và tranh luận , hơn là thấy những bộ phim như là những tạo tác , chúng bắt đầu nhìn thấy bản thân . -Tôi có một người cô là một người kể chuyện tuyệt vời . -Trong một lúc cô có thể đánh thức những hình ảnh như chạy chân trần trên núi Bàn và chơi trò cảnh sát và kẻ cướp . -Khá gần đây cô có bảo tôi rằng vào năm 1948 , hai trong số người chị em của cô và bố tôi đã du lịch trên một chiếc thuyền đến Israel mà không có ông bà tôi . -Khi đoàn thuỷ thủ nổi loạn trên biển vì nhu cầu thiết yếu của con người chính là những thiếu niên này đã cho đoàn thuỷ thủ ăn . -Tôi đã hơn 40 khi bố tôi mất . -Ông không bao giờ đề cập đến chuyến đi đó . -Mẹ của mẹ tôi đã rời khỏi châu Âu trong một nạn đói mà không có chồng của bà , nhưng với đứa con gái 3 tuổi và kim cương khâu viền trên váy . -Sau 2 năm lẩn trốn , ông tôi xuất hiện ở Luân Đôn . -Ông đã không bao giờ đúng nữa . -Và câu chuyện của ông đi vào im lặng khi ông bị đồng hoá . -Câu chuyện của tôi bắt đầu ở nước Anh với lý lịch tạm trong sạch và sự im lặng của bố mẹ là người nhập cư . -Tôi có " Anne Frank " , " Sự trốn thoát vĩ đại " , " Shoah " , " Chiến thắng của nhà Will " -Đó là Leni Riefenstahl trong ngôi chùa Nazi tao nhã tạo ra bối cảnh mà gia đình đó phải chịu đựng . -Những bộ phim này mang đến nỗi đau quá lớn đến không nói nổi thành lời và chúng trở nên hữu ích cho tôi hơn hàng ngàn lời thì thầm của những người sống sót và cái nhìn thoáng qua không thường xuyên vào hình xăm trên cánh tay người cô -Người theo chủ nghĩa thuần tuý có lẽ cảm thấy rằng sự giả tưởng xua tan nhu cầu hiểu thật sự của con người rằng phim quá thô thiển để nói về những câu chuyện phức tạp và chi tiết , hay những nhà làm phim luôn phục vụ sự cường điệu hơn là sự thật . -Nhưng trong những cuộn phim là mục đích và ý nghĩa -Như một đứa trẻ 12 tuổi nói sau khi xem " Phù thuỷ xứ Oz " " Mọi người nên xem phim này , bới vì nếu không xem mọi người sẽ có thể không biết mình cũng có trái tim " -Chúng ta xem trọng việc đọc sách , tại sao không xem trọng việc xem phim với niềm đam mê ? -Hãy xem " Công dân Kane " có giá trị như Jane Austen . -Hãy đồng ý rằng " Những cậu bé và khu dân cư " giống như Tennyson , đem lại khung cảnh xúc động và sự thấu hiểu cao độ rằng chúng phối hợp được với nhau . -mỗi một mảnh của nghệ thuật đáng nhớ , mỗi một viên gạch của bức tường về chúng ta là ai . -Và được thôi nếu chúng ta nhớ Tom Hanks hơn nhà du hành vũ trụ Jim Lovell hay đặt khuôn mặt của Ben Kíngléy chồng lên mặt của Gandhi -Và dù không có thật , Eve Harrington , Howard Beale , Mildred Pierce là cơ hội để khám phá là con người thì như thế nào và không hề bớt hữu ích khi hiểu về cuộc sống và thời gian của chúng ta như Shakespeare rọi sáng thế giới của Elizabeth nước Anh . -Chúng ta đoán rằng phim , nơi những câu chuyện là nơi hội tụ của kịch tích , âm nhạc , văn học , kinh nghiệm con người , có thể tham gia truyền nguồn cảm hứng cho những đứa trẻ tham gia trong FILMCLUB , -Cái mà chúng tôi không nhìn thấy trước được là sự phát triển có thể đo đạc được trong hành vi , sự tự tin và kết quả học tập . -Những học sinh từng miễn cưỡng giờ đây đến trường , nói chuyện với giáo viên của họ , đánh nhau , không phải ở sân chơi , mà là để chọn bộ phim chiếu vào tuần tới -- những đứa trẻ tìm thấy được định nghĩa bản thân , sự tham vọng và muốn học và tham gia vào cộng đồng từ những câu chuyện chúng xem . -Thành viên của chúng tôi thách thức sự mô tả nhị phân về cách mà chúng ta thường mô tả những đứa trẻ của chúng ta . -Chúng không hoang dã hay tập trung quá nhiều vào bản thân . -Chúng như những đứa trẻ khác , đang thương lượng với thế giới về sự lựa chọn vô cùng , nhưng lại là một văn hoá bé nhỏ về cách để có một trải nghiệm có ý nghĩa -Chúng tôi ngạc nhiên trước những hành vi của những đứa trẻ tự định nghĩa mình bằng cỡ nấc giày , những gì thu nhận được có tính chất tường thuật mà chúng tôi đem lại . -Nếu chúng ta muốn những giá trị khác chúng ta phải kể một câu chuỵện khác , một câu chuyện hiểu được rằng một dạng tường thuật cá nhân là một thành phần cần thiết của một cá thể con người , và một dạng tường thuật tập thể là cần thành phần thiết cho một bản sắc văn hoá , và không có nó thì thật không thể tưởng tượng được bản thân là một phần của tập thể . -Bởi khi những đứa trẻ này về nhà sau khi xem xong " Cửa sổ kì lạ " và đưa ánh nhìn của chúng vào toà nhà bên cạnh , chúng có những công cụ để tự hỏi , ngoài chúng ta có ai ngoài kia và câu chuyện của họ là gì . -Cảm ơn các bạn . -Ellen Jorgensen : Hack sinh học -- bạn cũng có thể làm được việc đó . -Chúng ta đã có máy tính cá nhân , tại sao không phải là công nghệ sinh học cá nhân ? Đó là điều mà nhà sinh vật học Ellen Jorgensen và cộng sự của bà tự hỏi khi lập ra Genspace , một phòng thí nghiệm sinh học phi lợi nhuận ở Brooklyn dành cho khoa học cộng đồng , nơi mà những người nghiệp dư có thể đến với công nghệ sinh học . Khác xa so với phòng thí nghiệm xấu xa của Frankenstein , Genspace cung cấp một danh sách dài những lợi ích vui nhộn , sáng tạo và thiết thực của DIYbio . -Thời đại này rất tuyệt để làm một nhà sinh học phân tử . -Việc đọc và viết trình tự DNA đang trở nên dễ dàng hơn và rẻ hơn . -Đến cuối năm nay , chúng ta sẽ có thể giải trình tự của 3 triệu mã thông tin từ bộ gen của mình trong vòng một ngày với giá ít hơn 1 ngàn euro . -Công nghệ sinh học có lẽ là ngành công nghệ mạnh nhất và phát triển nhanh nhất . -Nó có tiềm năng để thay thế nhiên liệu hoá thạch , cách mạng hoá y học , và ảnh hưởng đến mọi khía cạnh của cuộc sống hằng ngày của chúng ta . -Vậy ai là người sẽ thực hiện nó ? -Tôi nghĩ tất cả chúng ta đều khá yên tâm khi người này làm việc đó . -Nhưng còn người này thì sao ? -Vào năm 2009 , tôi lần đầu tiên biết đến DIYbio . -Đó là một phong trào khuyến khích đưa công nghệ sinh học đến với tất cả mọi người , không chỉ với các nhà khoa học và những người trong phòng thí nghiệm của chính phủ . -Ý tưởng ở đây là nếu chúng ta mở rộng cánh cửa khoa học và cho phép nhiều nhóm người khác nhau tham gia , nó sẽ thúc đẩy sự sáng tạo . -Đưa công nghệ vào tay những người sử dụng thường là một ý tưởng tốt vì họ biết rõ nhất về nhu cầu của chính họ . -Đây là một công nghệ tinh vi đi cùng với nó là những câu hỏi về mặt xã hội , đạo đức và đạo lý , và các nhà khoa học thì rất dở trong việc giải thích với công chúng một cách chính xác họ đang làm gì trong phòng thí nghiệm . -Do đó , phải chăng sẽ tốt hơn nếu có một nơi gần nhà mà bạn có thể đến và tìm hiểu về những điều này , và tự tay làm ? -Tôi nghĩ là tốt . -Nên 3 năm trước , tôi cùng với vài người bạn cùng chung lý tưởng lập nên Genspace . -Đó là một phòng thí nghiệm công nghệ sinh học cho cộng đồng và phi lợi nhuận , tại Brooklyn , New York , với ý tưởng là mọi người có thể đến , để tham dự những lớp học và " vọc " trong phòng thí nghiệm trong một môi trường cởi mở và thân thiện . -Không một kinh nghiệm nào trước đây có thể giúp tôi chuẩn bị cho những gì xảy ra sau đó . Bạn có thể đoán được không ? -Báo giới bắt đầu gọi chúng tôi . -Và chúng tôi càng nói về lợi ích của việc tăng sự hiểu biết về khoa học , thì họ lại càng muốn nói về việc chúng tôi đang tạo ra một Frankenstein mới , và kết quả là , trong vòng 6 tháng sau đó , khi bạn google tên tôi , thay vì tìm thấy những bài báo khoa học của tôi , bạn tìm được : -[ Tôi có phải là hiểm hoạ sinh học ? ] Việc đó thật đáng buồn . -Điều duy nhất giúp chúng tôi vượt qua giai đoạn đó là chúng tôi biết rằng trên toàn thế giới , có nhiều người cũng đang cố gắng làm cùng việc mà chúng tôi đang làm . -Họ cũng mở những nơi dành cho hacker sinh học , và một số còn phải đối mặt với những thử thách lớn hơn chúng tôi , nhiều quy định hơn và ít tài nguyên hơn . -Nhưng giờ đây , ba năm sau , đây là thành quả của chúng tôi . -Một không gian sôi động dành cho cộng đồng hacker sinh học toàn thế giới , và đây chỉ là bước đầu . -Đây là một trong những cộng đồng lớn nhất , và có nhiều nơi khác đang mở cửa hằng ngày . -Có một chỗ có lẽ sắp mở cửa ở Moscow , một ở Hàn Quốc , và điều thú vị là mỗi nơi đều có đặc điểm riêng của mình được phát triển dựa trên cộng đồng của họ . -Hãy để tôi đưa các bạn thăm quan một chút . -Những hacker sinh học làm việc đơn lẻ . -Chúng tôi làm việc theo nhóm , trong những thành phố lớn - - và trong những ngôi làng nhỏ . -Chúng tôi tự chế dụng cụ phòng thí nghiệm . -Chúng tôi áp dụng kỹ thuật di truyền trên vi khuẩn . -Chúng tôi hack phần cứng , phần mềm , phần ướt , và dĩ nhiên hack luôn mã sinh học . -Chúng tôi thích xây dựng mọi thứ . -Sau đó chúng tôi lại thích tháo rời mọi thứ ra . -Chúng tôi làm cho nhiều thứ phát triển . -Chúng tôi làm cho nhiều thứ phát sáng . -Và chúng tôi còn làm cho tế bào nhảy múa . -Tinh thần của những phòng thí nghiệm này là cởi mở và tích cực , nhưng đôi lúc khi người ta nghĩ đến chúng tôi , thì điều đầu tiên họ nghĩ đến là an toàn sinh học , an ninh sinh học và những điều đen tối khác . -Tôi sẽ không coi thường những quan ngại này . -Bất kì kỹ thuật nào cũng như con dao hai lưỡi , và , bạn biết rằng , khi bạn có những thứ như sinh học tổng hợp , công nghệ sinh học nano , bạn buộc phải nhìn vào không chỉ những nhóm nghiệp dư mà cả những nhóm chuyên nghiệp , vì họ có cơ sở hạ tầng tốt hơn , họ có điều kiện thuận lợi hơn , và họ có thể tiếp cận các tác nhân gây bệnh . -Vậy Liên Hiệp Quốc đã làm đúng như vậy , và gần đây họ làm một báo cáo trên toàn lĩnh vực này , và điều họ kết luận là sức mạnh tích cực của công nghệ này lớn hơn nhiều những sự mạo hiểm tiêu cực , và họ thậm chí còn đặc biệt xem xét cộng đồng DIYbio , và họ ghi chú rằng , không đáng ngạc nhiên lắm , báo chí có xu hướng đánh giá cao khả năng của chúng ta và đánh giá thấp đạo đức của chúng ta . -Sự thật là , thành viên DYI từ khắp thế giới , Mỹ , Châu Âu , tụ tập lại vào năm ngoái , và chúng tôi đặt ra một bộ luật đạo đức chung . -Và nó nhiều hơn rất nhiều những gì khoa học thông thường đã làm được . -Giờ đây , chúng tôi tuân thủ luật pháp của bang và luật địa phương . -Chúng tôi xử lí rác thải một cách hợp lí , chúng tôi tuân thủ các quy trình an toàn , chúng tôi không làm việc với những tác nhân gây bệnh . -Bạn thấy đấy , nếu bạn đang làm việc với tác nhân gây bệnh , thì bạn không nằm trong cộng đồng của chúng tôi , bạn nằm trong cộng đồng khủng bố sinh học , tôi xin lỗi . -Và đôi khi có người hỏi tôi , " Vậy nếu có tai nạn thì sao ? " -À , làm việc với những sinh vật an toàn như chúng tôi thường tiếp xúc thì khả năng xảy ra tai nạn , ví dụ như người nào đó vô tình tạo ra một loại siêu bọ , khả năng này cũng tương đương như khả năng một trận bão tuyết xảy ra giữa sa mạc Sahara . -Nó có thể xảy ra , nhưng tôi không có ý định để đời tôi phụ thuộc vào khả năng đó . -Thật ra tôi đã chọn một loại mạo hiểm khác . -Tôi đăng kí một chương trình gọi là Dự Án Gen Cá Nhân . -Đó là một nghiên cứu ở Havard mà khi kết thúc , họ sẽ lấy toàn bộ trình tự gen của tôi , toàn bộ thông tin y học của tôi , và nhận dạng của tôi , và họ sẽ đăng nó lên mạng cho mọi người thấy . -Có rất nhiều nguy cơ liên quan mà họ nói đến trong phần thông báo sự chấp thuận . -Điều mà tôi thích nhất là , ai đó có thể tải trình tự gen của tôi xuống , trở lại phòng thí nghiệm , tổng hợp một số ADN giả và đặt nó tại hiện trường một vụ án . -Nhưng giống như DIYbio , kết quả tích cực và tiềm năng tốt của một nghiên cứu như vậy lớn hơn sự mạo hiểm rất nhiều . -Bây giờ , có thể bạn đang tự hỏi , " Bạn biết không , tôi sẽ làm gì trong một phòng thí nghiệm sinh học ? " -À , không lâu trước đây chúng ta từng tự hỏi , " Nào , ai sẽ làm được gì với một chiếc máy tính cá nhân chứ ? " . -Vậy đây chỉ mới là sự bắt đầu . -Chúng ta chỉ mới thấy đỉnh của tảng băng DNA . -Để tôi cho bạn thấy bạn có thể làm gì ngay bây giờ . -Một nhà biohacker người Đức , một nhà báo , muốn biết chó của ai đã để lại những " món quà " nho nhỏ trên đường ? -Phải , bạn đã đoán ra . Ông ta quăng quả banh quần vợt vào tất cả các con chó trong khu phố , phân tích nước bọt , xác định con chó , và đối chất với chủ của con chó . -Tôi phát hiện ra một loài sinh vật đã xâm lược sân sau nhà tôi . -Trông như một con bọ hung nhỉ ? -Thật ra nó là một con bọ cánh cứng Nhật Bản . -Và cũng loại công nghệ đó -- được gọi là mã vạch ADN , nó thật sự tuyệt vời -- Bạn có thể dùng nó để kiểm tra xem trứng cá muối của bạn có thật là từ cá tầm không , xem miếng sushi đó có thật là cá ngừ không , hoặc pho mát dê mà bạn mua rất đắt đó có thật là từ dê không . -Trong không gian của một nhà biohacker , bạn có thể phân tích gen của bạn để tìm đột biến . -Bạn có thể phân tích ngũ cốc của bạn để tìm thực phẩm biến đổi gen , và bạn có thể khám phá tổ tiên của mình . -Bạn có thể thả khí cầu thời tiết lên tầng tĩnh khí , thu thập vi khuẩn , xem điều gì đang xảy ra trên đó . -Bạn có thể làm ra một dụng cụ kiểm duyệt sinh học từ men để phát hiện chất gây ô nhiễm trong nước . -Bạn có thể làm ra một loại pin nhiên liệu sinh học . -Bạn có thể làm rất nhiều thứ . -Bạn còn có thể thực hiện các dự án khoa học nghệ thuật . Một vài trong số đó thật sự rất ngoạn mục , và chúng nhìn nhận các vấn đề xã hội và sinh thái từ một góc nhìn hoàn toàn khác biệt . -Điều đó thật sự rất tuyệt . -Vài người hỏi tôi , à , tại sao tôi lại tham gia vào việc này ? -Tôi có thể có một sự nghiệp hoàn hảo trong các ngành khoa học chính . -Vấn đề là , có một điều gì đó trong các phòng thí nghiệm này mà bạn không thể tìm thấy ở bất cứ nơi nào khác . -Có một điều gì đó thiêng liêng về nơi mà bạn có thể thực hiện một dự án , và bạn không phải biện minh với bất kì ai rằng nó sẽ đem về rất nhiều tiền , rằng nó sẽ cứu nhân loại , hoặc thậm chí là nó có thể thực hiện được . -Nó chỉ cần tuân thủ những quy tắc an toàn . -Nếu bạn có những không gian như thế này trên toàn thế giới , nó sẽ thật sự thay đổi nhận định về việc ai được phép làm công nghệ sinh học . -Chính những nơi như thế này đã sản sinh ra máy tính cá nhân . -Vậy sao không phải là công nghệ sinh học cá nhân ? -Nếu mọi người trong phòng này đều tham gia , ai mà biết được chúng ta có thể làm những gì ? -Đây là một lĩnh vực rất mới , và như chúng tôi nói ở Brooklyn , bạn còn chưa thấy gì cả đâu . . -Một người huýt gió bạn chưa từng biết đến -Tại TEDxRotterdam , nhà quán quân thổi sáo thế giới Geert Chatrou biểu diễn bài " Eleonora " của A.Honhoff , và bài " Fête de la Belle " do ông tự sáng tác . Trong , ông ấy trò chuyện về cái mà đã thôi thúc ông đến với thổi sáo . -Cám ơn rất nhiều -Đó là huýt sáo -Tôi đang cố huyết sáo bằng tiếng Anh -Người đàn ông tròn trịa , tóc xoăn đến từ Hà Lan này là ai -- tại sao ông ấy lại huýt sáo ? -Thực ra , tôi huýt gió kể từ khi tôi bốn tuổi -- khoảng tầm bốn tuổi -Bố tôi từng lúc nào cũng huýt gió khắp nơi trong nhà and tôi cứ ngỡ đó là một phần trong cách giao tiếp của gia đình tôi . -Vì vậy tôi huýt gió cùng với ông ấy . -Và thậm chí cho đến khi tôi 34 tuổi , lúc nào tôi cũng quấy rầy và khiến mọi người khó chịu với việc huýt sáo của mình . Bởi vì , thành thật mà nói , huýt gió có thể coi là một tật xấu của tôi . -Tôi huýt gió một mình , tôi huýt gió trong lớp học , -tôi huýt gió lúc đang đi xe đạp , tôi huýt gió ở mọi nơi . -Và tôi huýt gió ở một buổi tiệc đón giáng sinh nữa với gia đình thông gia của tôi . -Và tôi nghĩ , họ có những bài nhạc giáng sinh thật tệ . -Và một khi tôi nghe nhạc mà tôi không thích , tôi cố để làm cho nó hay hơn . -Bài " Con tuần lộc mũi đỏ Rudolph " -- bạn biết bài đó chứ ? -Nhưng nó còn có thể biến tấu như thế này . -Nhưng trong một buổi tiệc giáng sinh kia -- thật ra là một bữa tối -- bữa ấy làm tôi rất khó chịu . -Chị dâu tôi nhắc tôi một vài lần , " Làm ơn đừng huýt gió nữa . " -Nhưng tôi cứ không thể dừng được . -Và tại một thời điểm -- và lúc đó tôi đã uống một ít rượu , tôi phải thừa điều đó -- tại thời điểm đó tôi nói , " Nếu có một cuộc thi , em sẽ tham gia . " -và hai tuần sau Tôi nhận được một tin nhắn : " Em sẽ đi Mỹ đó . " -Đựoc thôi , tôi chuẩn bị được đi Mỹ . -Tôi rất thích , nhưng tại sao ? -Vì vậy tất nhiên tôi gọi chị ấy ngay lập tức . -chị ấy tìm trên google , và chị ấy tìm thấy cuộc thi huýt sáo thế giới ở Mỹ , tất nhiên . -Chị ấy khồng nghĩ tôi sẽ đi . -Và tôi sẽ mất mặt lắm . -Tôi không biết câu đó có đúng ngữ pháp tiếng Anh không . -Nhưng những người nói tiếng Hà Lan ở đây sẽ hiểu ý tôi muốn nói gì . -Tôi đã rất mất mặt -Và cô ấy nghĩ , " Anh ta chắc sẽ khồng bao giờ đi đâu . -Nhưng thật ra tôi đã đi . -Tôi đến Louisburg ở miền bắc Carolina , phí đông nam nước Mỹ. và tôi bước vào thế giới huýt sáo -và tôi cũng tham gia cuộc thi huýt gió thế giới và tôi đã chiến thằng ở đó năm 2004 . -Điều đó -- Điều đó thật tuyệt , tất nhiên . -Và để bảo về danh hiệu của mình -- như judokas do và những vận động viên thể thao -- Tôi nghĩ , năm 2005 hãy quay trở lại đó xem sao , và tôi tiếp tục giành giải quán quân . -Sau đó tôi không thể tham dự cuộc thi trong vòng vài năm . -Và năm 2008 tôi tham gia một lần nữa ở Nhật , thành phố Tokyo , và tôi lại giành giải quán quân . -Và chuyện mới xảy ra ở đây là tôi đứng ở đây , ở Rotterdam , ở một thành phố xinh đẹp , đứng trên một sân khấu lớn , và nói về chuyện huýt gió . -Và trên thực tế tôi kiếm tiền từ huýt gió , tại thời điểm này . -Vì vậy tôi bỏ việc làm của tôi là làm y tá . -Và tôi cố sống giấc mơ của tôi -- nhưng thực ra , đó chưa bao giờ là giấc mơ của tôi , nhưng điều đó nghe thật tuyệt . -Okay , tôi không phải là người huýt gió duy nhất ở đây . -bạn nghĩ , " Hả , ý bạn là gì ? " -Thật ra , bạn sẽ huýt gió cùng tôi -Và sau đó như thường lệ : mọi người nhìn nhau và nghĩ , " Trời đất ơi " -Tại sao ? Tôi có thể đi về được không ? " -Không , bạn không thể -Thật ra huýt gió rất đơn giản . -Bản nhạt mà tôi sẽ huýt theo được gọi là " Fête de la Belle . " -Bản nhạc này dài khoàng 80 phút . -Không không không . Nó chỉ dài bốn phút thôi . -Và tôi muốn luyện tập cho bạn cách huýt gió trước . -Vậy thì tôi sẽ huýt tông của bài nhạc nhé . -Xin lỗi . Tôi quên mất một điều . -Bạn huýt với cái tông giống tôi . -Tôi đã từng nhiều giọng khác nhau , -Đây thật sự là một điều rất hứa hẹn . -Đây thật sự là một điều rất hứa hẹn , -Tôi sẽ hỏi những người ở phía sau cánh gà mở nhạc lên . -Và nếu nhạc bắt đầu , tôi sẽ chỉ chỗ nào để bạn huýt gió theo. và chúng ta sẽ thấy điều gì sẽ xảy ra , -Oh , hah . -Tôi thật sự xin lỗi các bạn kĩ sự ở sau cánh gà . -Tôi đã quen với việc đó rồi . -Tôi sẽ tự bắt đầu -Được rồi , chuẩn bị . -Được rồi . -Rất dễ phải không ? -Phần sô lô chuẩn bị đến . Tôi nghĩ tôi nên làm phần đó . -Max Westerman : Geert Chartrou , nhà vô địch thổi sáo thế giới . -Geert Chatrou : Cám ơn . Cám ơn . -Roberto D 'Angelo + Francesca Fedeli : Bài học cuộc đời từ căn bệnh của con trai tôi . -Roberto D 'Angelo và Francesca Fedeli từng nghĩ con trai Mario của họ khoẻ mạnh -- cho đến khi bé được 10 ngày tuổi , họ phát hiện bé bị đột quỵ sơ sinh . Với Mario không thể điều khiển phần cơ thể bên trái , họ vật lộn với những băn khoăn : Bé có " bình thường " trở lại ? Bé có sống trọn vẹn một đời không ? Một câu chuyện thấm thía về việc cha mẹ đối diện với sự sợ hãi -- và cách họ xoay chuyển chúng . -Francesca Fedeli : Xin chào . -Đây là Mario . Con trai chúng tôi . -Bé mới được 2 tuổi rưỡi , tôi đã có khoảng thời gian mang bầu thật khó khăn vì phải nằm trên giường gần 8 tháng . -Nhưng cuối cùng thì mọi việc dường như đã ổn . -Nên bé sinh ra được đủ cân nặng . -Bé có chỉ số Apgar tốt . -Nên chúng tôi khá an tâm . -Nhưng cuối cùng , 10 ngày sau khi sinh , chúng tôi phát hiện bé bị đột quỵ . -Như bạn biết đấy , đột quỵ là một sự tổn thương não . -Đột quỵ sơ sinh là một chuyện có thể xảy ra trong 9 tháng mang thai hoặc bất ngờ ngay sau khi sinh , trong trường hợp của bé , như bạn thấy , phần não phải của bé đã không còn . -Hậu quả của cú đột quỵ đối với cơ thể của Mario có thể tệ đến mức Mario sẽ không còn có thể sử dụng được phần cơ thể bên trái nữa . -Hãy tưởng tượng , nếu bạn có một máy tính và một máy in và bạn muốn gửi , muốn in ra một tài liệu , nhưng máy in không có những ổ đĩa phù hợp , Mario cũng vậy . -Giống như , bé muốn di chuyển phần cơ thể bên trái nhưng không thể chuyển giao tín hiệu để chuyển động tay và chân trái . -Vậy là cuộc sống đã phải thay đổi . -Chúng tôi phải thay đổi kế hoạch của mình . -Chúng tôi phải thay đổi những ảnh hưởng của việc sinh nở này với đời mình . -Như bạn tưởng tượng , không may là chúng tôi chưa sẵn sàng . -Không ai dạy chúng tôi xử lý những khuyết tật như vậy thế nào , và ngày càng có nhiều băn khoăn bắt đầu chiếm lấy tâm trí chúng tôi . -Đó thực sự là quãng thời gian khó khăn . -Những băn khoăn , những điều cơ bản như , bạn biết đấy , tại sao chuyện này xảy đến với chúng tôi ? -Vấn đề ở chỗ nào ? -Một số băn khoăn nặng nề hơn , như là cuộc sống của Mario sẽ bị ảnh hưởng thế nào ? -Ý tôi là , rồi bé sẽ có thể làm việc không ? -Bé sẽ có thể bình thường trở lại ? -Và , như bạn biết , là cha mẹ , nhất là lần đầu tiên , tại sao bé sẽ không trở nên tốt hơn so với chúng tôi ? -Và điều này , thật ra là , thực sự rất khó nói ra , nhưng vài tháng sau , chúng tôi nhận ra chúng tôi thực sự cảm thấy như mình thất bại . -Ý tôi là thành quả thực tế duy nhất của đời mình , cuối cùng , lại là một thất bại . -Và bạn biết đó , nó không phải là thất bại với chính bản thân chúng tôi mà là thất bại sẽ ảnh hưởng đến suốt đời Mario . -Thành thật mà nói , chúng tôi thất vọng . -Ý tôi là thực sự thất vọng , nhưng cuối cùng , chúng tôi nhìn lại bé , và cùng nói , chúng ta phải hành động . -Ngay lập tức , như Francesca đã nói , chúng tôi thay đổi đời mình . -Chúng tôi bắt đầu liệu pháp phục hồi chức năng và vật lí trị liệu. và một trong những hướng mà chúng tôi đang theo đuổi trong vật lí trị liệu là hướng dẫn neuron đối chiếu . -Về cơ bản , chúng tôi làm việc này cùng Mario trong nhiều tháng . -Bạn có một đồ vật , chúng tôi chỉ cách cho bé làm thế nào để nắm lấy các đồ vật đó . -Vâng , lý thuyết neuron đối chiếu nói một cách đơn giản rằng trong não bạn , chính xác như bây giờ , khi bạn thấy tôi làm thế này , bạn kích hoạt đúng những neuron như tôi giống như chính bạn đang làm vậy . -Có vẻ như đây là một sự đột phá trong vật lí trị liệu . -Nhưng một hôm , chúng tôi phát hiện ra Mario không nhìn vào tay chúng tôi . -Bé nhìn vào chúng tôi . -Chúng tôi là tấm gương của bé . -Và vấn đề , như có thể bạn thấy , rằng chúng tôi thất vọng , chúng tôi suy sụp , chúng tôi đã coi bé như một vấn đề chứ không phải là một bé trai , không từ một nhận thức tích cực . -Ngày hôm đó thực sự thay đổi nhận thức của chúng tôi . -Chúng tôi nhận ra mình phải trở thành một tấm gương tốt hơn cho Mario . -Chúng tôi bắt đầu lại từ nghị lực , đồng thời , bắt đầu lại từ khả năng của bé . -Chúng tôi không còn coi bé là một vấn đề nữa , và chúng tôi bắt đầu coi bé như một cơ hội để trở nên tốt hơn . -Và thực sự , điều này đã thay đổi , từ phía chúng tôi , chúng tôi nói , " Chúng ta có khả năng gì để trao cho Mario ? " -Và chúng tôi bắt đầu từ mong muốn của mình . -Ý tôi là , cuối cùng , vợ tôi và tôi cũng khá khác nhau , nhưng chúng tôi có nhiều điểm chung . -Chúng tôi thích du lịch , yêu âm nhạc , thích ở những nơi thế này , và bắt đầu đưa Mario đi cùng chỉ để cho cháu thấy những gì tốt nhất chúng tôi có thể chỉ cho cháu . -Video này được quay từ tuần trước . -Tôi không nói rằng -- -- Tôi không nói đó là phép màu . Đó không phải là thông điệp , vì chúng tôi mới chỉ bắt đầu chặng đường . -Nhưng chúng tôi muốn chia sẻ nhận thức cốt yếu , nhận thức cốt yếu mà Mario đã đưa chúng tôi tới , rằng hãy coi những gì bạn có một như món quà chứ không phải là những gì bạn đã bỏ lỡ , và coi những gì bạn đã bỏ lỡ chỉ như một cơ hội . -Và đây là thông điệp mà chúng tôi muốn chia sẻ với các bạn . -Đây là lí do tại sao chúng tôi tới đây . -Mario ! -Và đây là lí do -- -- Và đây là lí do chúng tôi quyết định chia sẻ tấm gương tốt nhất trên đời với bé . -Cảm ơn các bạn rất nhiều , tất cả các bạn . -Xin cảm ơn . Cảm ơn . Tạm biệt . -Cảm ơn . -Mark Shaw : Bài biểu diễn rất khô -Mark Shaw giới thiệu chất Cực Khô , một dạng chất lỏng phủ bề ngoài có thể chống thấm nước và những chất có chứa nước . Ở mức độ phân từ nano , chất này bao phủ bề mặt vật liệu bởi lớp không khí bảo vệ , vì vậy nước không thấm được . Bạn hãy xem một màn biểu diễn tuyệt vời trong vòng hai phút nhé . -Hôm nay tôi đến đây để chỉ cho các bạn Những thứ bạn không thể thấy được nhưng rất thú vị khi nhìn vào đó -Các bạn sắp sửa thử một loại kĩ thuật mới và rất thú vị , làm cho chúng ta suy nghĩ lại chúng ta chống thấm nước như thế nào -Tôi có một khối đá xỉ ở đây một nửa đã được phủ lớp xịt vi phân tử - kĩ thuật nano có thể áp dụng cho hầu hết các loại vật liệu . -Đó gọi là chất Cực Khô và khi bạn bôi vào vật liệu nó sẽ hình thành lớp màng chống thấm nước -Đây là khối đá xỉ , trơn và bạn thấy nó rỗng , có thể hút nước . -Đây thì không . -Rỗng , không rỗng . -Vậy tính chống thấm nước là như thế nào ? -Đô chống thấm là khả năng đo một giọt nước trên bề mặt . -Giọt nước càng tròn , độ chống thấm càng cao , và nếu như rất tròn , nó chống thấm cực tốt . -Một xe vừa bôi sáp , những phân tử nước sụt xuống gần ̣ 90 độ . -Một lớp kinh chắn gió cho ta khoảng 110 độ . -Chúng ta đang nhìn thấy đây là 160 - 175 độ , và từ 150 độ trở đi là chống thấm cực tốt rồi . -Một phần của buổi thuyết trình , tôi có một đôi găng tay và chỉ phủ một chiếc với lớp chống thấm vi phân tử hãy xem xem chiếc găng nào nhé , tôi sẽ gợi ý -Bạn đoán ra chiếc nào khô rồi chứ ? -Khi chúng ta có kĩ thuật vi phân tử và khoa học vi phân tử chúng ta có thể nhìn thấy nguyên tử và phân tử và có thể điều khiển chúng để đem lại nhiều lợi ích -Chúng ta nói đến " rất nhỏ " ở đây -Khoa học vi phân tử có kích thước nanomét một nano mét bằng một phần tỉ mét , để đo chúng , nếu bạn có một phân tử nano , kích thước một nano mét , nếu đặt 50,000 hạt với nhau , sẽ bằng bề dày của sợi tóc . -Vậy thì rất nhỏ , nhưng rất hữu dụng . -Không chỉ có nước -còn rất nhiều chất có nước như bê tông , sơn có chứa nước , bùn , và một số loại dầu tinh chế nữa . -Bạn có thể thấy sự khác biệt . -Phần kế tiếp , chúng ta có tấm kính ô vuông này , và đã được phủ lớp chống thấm ở rìa bao phủ lớp chống thấm vi phân tử , và ta sẽ nhỏ thứ nước nhuộm xanh này vào chính giữa , bạn sẽ thấy , nó sẽ tràn ra ngoài tấm kính như bạn thường thấy , ngoại trừ phần đã được phủ và tôi không thể làm cho nó tràn ra . -Đó là tính sợ nước . -Vậy thì chuyện gì đã xảy ra vậy ? -Bề mặt của lớp phủ chứa những phân tử nano hình thành lớp bề mặt rất thô . -Bạn có thể nghĩ là nó trơn , nhưng không phải vậy . -Và có hàng tỉ khe hở giữa chúng , những kẽ hở này , và những phân tử nano chiếm lấy những phân tử không khí và bao phủ lớp ngoài bởi không khí . -Đó hình thành lớp ô bảo vệ vậy là nước tác động vào lớp không khí bùn , bê tông , đều trượt đi hết . -Nếu tôi cho tấm ván này vào nước , bạn thấy lớp bạc óng ánh xung quanh nó , và lớp bạc này chính là lớp không khí bảo vệ để nước khỏi chạm vào tấm ván , và nó hoàn toàn khô . -Vậy thì ứng dụng ở đây là gì ? -Có thể các bạn đang suy nghĩ trong đầu -Mọi người có thể rất phấn khởi , cho rằng , " Ồ , tôi có thể sử dụng vào việc này , việc nọ . " -Những ứng dụng thường thấy là những chất chống ẩm . -Ngày nay chúng ta đã thấy -những chất chống đóng băng , vì bạn không có nước , bạn không có băng . -Đó có thể là chất chống rỉ sét . -Không có nước , không có rỉ sét . -Có thể là chống vi khuẩn . -Không có nước , vi khuẩn không kí sinh được . -Và tất cả những chất tự rửa -Thử tưởng tượng những ứng dụng này có thể cải tiến lĩnh vực bạn đang làm . -Sau đây là bài biểu diễn cuối cùng của tôi , nhưng trước đó , tôi muốn cảm ơn tất cả các bạn , và nghĩ ít thôi nhé . -Sẽ được thôi . Từ từ . -Bạn biết chúng tôi cắt biển hiệu của chương trình TED chứ ? -[ Hai phút sau ... ] Ông ấy có rất nhiều nghiên cứu y học . -Được rồi ! -̣ -Dan Ariely bàn về đoạn mã đạo đức bị lỗi của con người -Nhà kinh tế học hành vi Dan Ariely nghiên cứu các lỗi trong đoạn mã phẩm chất đạo đức của chúng ta : lý do tiềm ẩn khiến chúng ta nghĩ sẽ chẳng sao nếu gian lận hoặc trộm đồ . Các nghiên cứu khéo léo giúp làm rõ ý tưởng của ông về việc chúng ta cư xử vô lý - và có thể bị ảnh hưởng theo những cách chúng ta không hiểu hết . -Hôm nay , tôi muốn nói 1 chút về tính phi lý có thể lường trước được -Niềm đam mê của tôi về lĩnh vực này bắt đầu cách đây nhiều năm trong bệnh viện . -Khi ấy tôi bị bỏng nặng . -Nếu chẳng may bạn phải dành nhiều thời gian ở bệnh viện , bạn sẽ bắt gặp nhiều dạng phi lý -Một điều đặc biệt gây khó khăn cho tôi tại khoa điều trị bỏng là quá trình các y tá tháo băng cho tôi . -Bây giờ , bạn phải tháo băng y tế tại 1 số vị trí , bạn phải cẩn thận xem xét đâu là cách làm đúng -Cách 1 : bóc nó ra nhanh - thời gian ngắn nhưng phải khá mạnh tay hoặc cách 2 : bạn tháo băng từ từ-- mất khá nhiều thời gian- nhưng từng giây qua đi bớt đau đớn hơn-- Vậy cách nào là đúng ? -Các y tá trong khoa tôi nằm cho rằng phương pháp đúng nhất là cách 1 , họ giữ 1 đầu và bắt đầu bóc và giữ 1 đầu rồi bóc . -Vì tôi bị bỏng 70 % cơ thể nên mất khoảng 1 tiếng tháo băng . -Như bạn có thể tưởng tượng tôi căm ghét cái khoảnh khắc bóc toạc với 1 sức mạnh kinh hồn . -Và tôi sẽ cố gắng lý sự với họ " Tại sao chúng ta không thử cách khác ? " -" Tại sao chúng ta không làm lâu hơn 1 chút 2 tiếng thay vì 1 tiếng , và nhẹ tay hơn ? " -Và các y tá nói với tôi 2 điều . -Họ nói rằng mẫu bệnh nhân đúng mực là những người tin tưởng vào các y tá luôn thao tác đúng để giảm đau tối đa và họ cũng nói rằng bệnh nhân không nên gợi ý hay can thiệp , hoặc ... -Đây không phải bằng chữ Hebrew -Nó bằng mọi thứ ngôn ngữ tôi từng biết -Và , bạn biết đấy , không có nhiều nhiều thứ tôi có thể làm và họ tiếp tục làm công việc của mình . -Và khoảng 3 năm sau , khi tôi ra viện , tôi đã bắt đầu học đại học -Và 1 trong số các bài học thú vị nhất tôi đã học là phương pháp thử nghiệm nghĩa là nếu bạn nghi vấn điều gì , bạn có thể tạo 1 bản mô phỏng nghi vấn một cách trừu tượng , bạn có thể cố gắng kiểm tra nghi vấn , có thể học được chút gì về thế giới . -Đó là những gì tôi đã làm . -Tôi vẫn rất quan tâm đến câu hỏi làm cách nào để tháo băng y tế cho bệnh nhân bỏng . -Ban đầu tôi không có nhiều tiền , vì thế tôi đã đến cửa hàng kim khí và mua 1 cái bàn kẹp thợ mộc . -Sau đó tôi mang mọi người tới phòng thí nhiệm , đặt ngón tay họ vào đó , và tôi kẹp họ 1 chút . -Và tôi kẹp trong 1 khoảng thời gian dài và ngắn , cơn đau lúc tăng lúc giảm , có lúc nghỉ ngơi và có lúc không- tất cả các mức độ đau đớn . -Sau khi thôi không làm đau mọi người nữa , tôi sẽ hỏi họ Bạn có đau không ? Đau như thế nào ? -Hoặc nếu được chọn giữa 2 kiểu đau cuối , bạn sẽ chọn cái nào ? -Tôi tiếp tục làm thí nghiệm này 1 thời gian -Và sau đó , giống các đề tài nghiên cứu hay khác , tôi nhận thêm nguồn tài trợ . diff --git a/nlp-tf.png b/nlp-tf.png new file mode 100644 index 0000000..59866bb Binary files /dev/null and b/nlp-tf.png differ diff --git a/ocr/1.cnn-rnn-ctc.ipynb b/ocr/1.cnn-rnn-ctc.ipynb new file mode 100644 index 0000000..53f2683 --- /dev/null +++ b/ocr/1.cnn-rnn-ctc.ipynb @@ -0,0 +1,825 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget http://baidudeeplearning.bj.bcebos.com/image_contest_level_1.tar.gz\n", + "# !tar -zxf image_contest_level_1.tar.gz" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import os\n", + "import tensorflow as tf\n", + "import matplotlib.pyplot as plt\n", + "from skimage.transform import resize as imresize\n", + "import cv2\n", + "import time" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "100000" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "directory = 'image_contest_level_1/'\n", + "images = ['%d.png'%(d) for d in range(100000)]\n", + "with open(directory+'labels.txt','r') as fopen:\n", + " labels = [i.split()[0] for i in list(filter(None,fopen.read().split('\\n')))]\n", + "len(images)\n", + "len(labels)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAACeCAYAAAArIFF5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO29d5xlV3Xn+903161cXdVZHaRWRpaQBEhIAoFNBhkzhgHzMBgYbI/NGNszhOE94zh+eN6z8XzGHoMBEwYLY7KxQICEDLKFIi0JhVZoutW5q7pyuFU37PljrXVuqFtVt3Jdev8+n/rcuuGcs88+++zz22v91lrOe09AQEBAQPMhtt4NCAgICAhYGsIEHhAQENCkCBN4QEBAQJMiTOABAQEBTYowgQcEBAQ0KcIEHhAQENCkCBN4QFPBOfenzrn3rOL+tzjnHnPOpVfrGAEBK4UwgQc0DZxzfcAvAx+t893vOee8c+7naj6/0Tn3tprP9jjnbnHODTnnTjrn/qdzLgHgvT8FfA941xLa55xzf+ycO+acG3HO3eGcu3Sx+wkIaBRhAg9oJrwNuMV7P1X5oXPuPOD1wImKz17rnHtXxfvXOed+Vd/+NXAa2AZcAbwQ+I8Vu/wc8KvUgU7+h+Zo3+uBtwM3AD3AXcBnGzy3gIBFI0zgAc2EVwD/UufzvwLeB8xUfPY1oAD8N+A9wHnA3+l3e4EveO9z3vuTwLeASqZ8N3Cuc273Itu3F7jTe3/Qe18E/jdwySL3ERDQMMIEHtBMuAw4UPmBc+71wLT3/pY6v6/ME1GqeP8R4I3OuaxzbgfyYPhWtJH3BeAp4PJFtu/zwHnOuQucc0ngrZX7DQhYaSTWuwEBAYtAFzBmb5xz7QjDfkmd374WSAEfBHYDo4h546PA9xEb9ygQBz4NfLVm+zE93mJwArgTecgUgSPAixe5j4CAhhEYeEAzYQhor3j/+8BnvfeHan/ovf+K9/6jKOv23n/Ze/9R51wMYcVfBlqBXqAb+HDNLtqBYQDn3C8554adc8PAQ8Aue69/u3Sb3wOeA5wDZIA/AG53zmWXf+oBAbPhQjbCgGaBc+67wN957z+n7/cDOxFbN0AfMAJ82HtfOyHbPnqBfqDLez+in70W+GPv/bP0fQJh5xd77w/XbL8HuMN7v6fOvr8BfMd7/5cVnw0DP+e9v2+Jpx0QMCcCAw9oJtyCKEYMPws8C1GSXAEcR9QjfzXXDrz3A8BPgF93ziWcc12Irfqhip89FzhUO3k3gHuB16uWPOacewuQROzpAQErjmADD2gmfAbY75xr8d5Pee/PVH7pnCsCQ9778QX28zrEkfk+xFZ9O/DbFd+/GfibJbTvw8BmYD9innkK+Hfe++El7CsgYEEEE0pAU8E599+A0977j6zS/jcjUsVne+9zq3GMgICVQpjAAwICApoUwQYeEBAQ0KRY1gTunHu5c+6Ac+4p59z7V6pRAQEBAQELY8kmFOdcHHgCCaI4injg3+S9f3TlmhcQEBAQMBeWo0J5LvCU9/4ggHPu88DPA3NO4L09cb/nnOQyDhkQsHQ8PNILgIsLaXlW25n5fh6wRnjiofpxThf8zOQat2Tj4v6Hpge89321ny9nAt+BhAobjgLPq/2RZoR7F8CuHQnuufWcZRwyIGDp2HvLOwFId0wDcM/1IVHgRsDLtl9R9/Nbb92/xi3ZuIhve6puTMKq68C99x8DPgZw9eWZIHkJWHN8e1JWfX3bRgCYmpH3v3r0WgA+uPU77Eq0rU/jznLcPz3DefdmALjz5isBuP5ND0TfAVyVTq1P45oAy3FiHkNyPhh26mcBAQEBAWuA5TDwe4HznXN7kYn7jcAvrUirfkqQ90UGilW1B0g6B0BvvHU9mnRWIuWKVe8nxoTx9eeEdfcXU+wKMcnrgqvSKej9PgDvePf3Z38XMC+WPGy99wXn3G8CtyIpOT/pvX9kxVoWEBAQEDAvlsU7NIl+vUT6ZzVOFycAOFmM84PJiwF4bGI7AL/R9z0AeuPr07azEdmYOC0zCU1aqJ6YgSlh4Dkf6Pd6IjDtpSOM3BXASEnMJANFWarvn5bJ+q+fuZH+cTGVbO8YBeBUj0waF1Na41aevWh1MnGf39kPwLF+qdPQPyrX4lC+j/OTzwCwOZi2ApoIIZQ+ICAgoEmx4Rn4eKl+Qri2WGaNWzI3BpV5/82ZGwC4/dj5AIyOZSlMSRcPJgv1Nw5YdfTFZbWzJS2roHQmD0BuUpbuH3/meq44//MAbA6mrYAmQmDgAQEBAU2KDc/A+4vCXL82/iwAnp99EoBzExPrLsXLe2HeR4piS31kZBsAQ0Py3ufKdK5QlGfljLfPgg18rZB10ucvaJeC9t9KiWN5alhWccOTLRwqdAOwNyEsPRsLjrWAjY/AwAMCAgKaFBuWgZvt+3ChA4A7Bi4E4Lte2NMHdv0zIHK9xTBxC8+txVKkTBak88CUrA4OnekBwM/oc7EicYD3btH7X2+sZF+tJ8xfsjsxBMC53ZLE6kejkkRpYirFjyb3AHBlSmoPBwY+G42Mh6KXleWQKrMmNdtpXu+FzpjTV7kmSRecDstBYOABAQEBTYoNy8Bzal/+3rgw7mdGRLubL8oT+7+7V/Cnu78CNB4Uc//0DJ8YeAEwO3GOhfMuhl32l6T7/vnkZdJmVTVQnM22nWuePF7GtObrq2Zj4QDbE3Jd/q+tPwTg8f4tAEyOp/nh4F4AbuqQDHjb1qF9GxWNjAeAfclipMh6cGYrAN8cknsjruP/guxJAC5KnwAg4/L0xWUlbWqhoMVvHIGBBwQEBDQpNgwDN9vZuJew5wP5FgCenpAc5uOTaQDyynIHs1OcKjYW1VjJIB79kDCCHkQLbGyiNpHOfLDIyydnhMGdGlPVyczcz8NEfOOrTmqZ1kr0laEysZetXMZK1Sx+a1wS+JudtAScqVnN9GkxhqUokDpjMqa2qy18V7e8Pj66jcNDokI5uEOKPuxO9Fdtc7ag0s59f24PAPvHdwGzx8O//r2Mh1f9hqxazhSL3DxwDQBPa/GMwQnxM8zMyDL5+6nzAOjMyljIF+PsbB8G4I92fR0IWvzFIDDwgICAgCbFhmHgJZVsHC/I6z8OPReAx85sBqAwnbAfAjAxs/jSbHfefGXEHgyRHW8RsJwn/9h/NQBTujqoZ/s2mA68GWBMeyX6ytREp4oFbpu8AICvnpQKLKPTokRoS2myqbho/n9pm9io8z7BZ48Jo8smhBn+2vY7ALgsJey5PSbjYjGRuX2a3KonLYw/kS4wrePrn85I267c/i0AOtfosllMQS3WSqVRufqy69/3yqMAjH12BwBZqqOJr/z3DwNECp5vHr+EE6fEV+Un9H5N6w2r94aNqAm0jFrMMz4l9893Nl2s3z4mxw828QXRPLNKQEBAQEAVNgwDN4x5YdZjeWFUkzl5OvuCPmv0Se69W3RU4/VvemCWB/0dS1CfnCwKezg12Q4Q5Tv5aYH1zXL6qpJ5A3xi8PncfkIY+KDqrws5udZer2k8Iyz09/tfI597RyEv1ziRlO8+UnwJADdteRCA17U/AUCLL4+BuJufl/TEZZ9v2ixM/9GBLQwPC8sbmhGbd39RznGtCj1M+7weV/qrLy4HXm0GXs/vYSuvsTPCvNvfIoW2+m/ZCZTHwxs3Sf99f/wiAI4f3sQeEYZx6mpp//QmuS6lbP0VBkUXqbc+9tj1AHy3tzLWA7JOxtJGyn+0UbBhZp7yElJu5qmi3tymvrNXdWIl4qVZlVbmQjTh9H5/WVU/zHnZXxTH6siUDqh5TCeGqemUbiuBSQNFSV+63ukAKmF90d/9IwCueddTVd9vj8vNPlnxvKwNeDFn9GBJJqKbR8TMtH94ZzRx5yd0G7um2n/FcRmOU+MVw1K7tpCQz54qinPs+yl5GPQlxgA4P3UagN0JT6eb3/FojsmtcQmbP7f7TBTUYw/l1c4Rbjnj+9W0ZqH8/Rq4dllazBe7E4sPVlsK6pnNrv5NGQdXtMlYverdX6zaxvpo/4hM7Pv+vkByQMxS2/KaqvfVC5s6/ZQ8pKam5LocUHHAX8RfCsB7d34TkPQZsLHumfVGMKEEBAQENCk2DAMf1yXkA1PCrA6ObAJgZrKa4bmUMLz29HRUaQUac2guN/jEnJdfOP0cAMbGG5eYTU9JGz919PkAXHGeMKyNVJlnQFlhVh/rXzj5YgASTvp8045xANpjOS5PTc3eQQXMBHHf0G4ADvZvKjuibcXSSGyT/SavTjAdDw8cE9b3uDq5n7/tEAC/0vsDLkw2tuTu0xVFZzJHXB1muby0caKUtgM30MjFw5j3nxx7JQCHx4SBW8DLzjaR1v0/O8SMsNrjpNJsNrfJrPr++ZY675++We7ZLZNj0ZL5xDVq+oxJv7oWuXdcTL73JTWF5uKzxoGZJB86JoVR/rB4EwB/sPtrAGRcLphTFIGBBwQEBDQpNgwDN0Zyx5AkrTJ7qTm4zBaaSIlt9eLOU1GprEYZ+HJQ9KXIeWlsaZbzUu3zlGYzTEstOzYtzKS/aHa81Q3wMVZdi3p2xFN6Df7smVcA8MRJsfXH9bw+4aRgxRu33sNFyfr7nfLCavuLnXL8KTnOzGRqdpIv6y9zITTCzJWJTw8JA5tOSn9+NycsMO9j/KfNtwOwOzE/E+9RCeIrex7irqN7ACiV1obT2PW3sXRyQPorpgw1rsy10WC1paLSPwTVQVpzrVhtTD2aOxeA0X3Stp4DKU68TH0IW4Rxx3pkPPR2i6/i8l5xiP6oX1ZQ/Sc7oVDjQ9L35is5NChJ4j7RKo7W3+j73oLX9mxBYOABAQEBTYoNw8An1KN9JifMpDCjTTOSprbvzV1ih92aHomE/muFvLaxWKrPJGNpYR0lK+RQqU7R384U5LvVKuwwEKkb5NgH8uLR//awpLx9W+8PALEjtjhhOCa7M1Z4ekJYX35cvi+qvC+nyqBcKUl6DqnepKqJ7prYJ+814MqXXLm/bFGVlHNPZ6vtzEktP5fPJyLfwaw0BbbKMUY+Lkz8vlPn8KmE+BnerefaNgdNMfa2KT5OS1qYoqmFhkuy2hooHgdWRvlgSquhUo5jBQlPHzOZbLGcPgBgpri2zpFFJXHTtn7/jJQOLHXI9frJTUnsDGqZ9zVbDgFwSfZ41b7uyieYGJfrMOu+0fEypX6Pu09Kn/WlrubXuu8G5r62ZwvO8tMPCAgIaF5sGAY+C6Vqu5jZvntbhGG+qO3RqFTWWmDcT3OmKGHCkzVh/Ol2UcOY/dK0MaWJcvc6ZbFtyvR2aOkuWNlkSSMlacNHTv8cULY1ZhIaUOPVjrj5e+xOSEunS8KaIlY4Wa1vtyCqmCokzkudJski+95TtnUnZD/GvLd3jwBwfd/TADw7exgQJcgnj1wnbVLfgalExofVRzKtHETtpkNDbezPyjkf6pKVRE9sYXtpRln/6Jjs9+9PSAj/ZYtMWTwfjIHvn+7ic8efJ+czJezS5+U8vLp1hsalHcdUH366+MyGCSkfLEk/jszIq9Pr6VuK0bVtycrYOrdTimdc1CLpY29okdiC61rkWr+iu49PHZdr/ES/+Fxyupqya2vMfCIlnx+d6maya1VOrekQGHhAQEBAk2LDMPC4GrzaVcOLFUBwGmatduZNaWHgXbFp2tYg1We5tJvjcyerWZMxymxGWPVztkrE2vcPi/03NzG7e1+748FVaafZvvdPi3bWmPdAv0T2pbLSxmxSPPqfjF/Pb/XeAcCY2vY//oyEMpvd2WD22eFcub/nKjlmq6JrW4VpfTN+ie6k/BvTAne2ipb87ef8KwAvbhHmbQmqJn2RK/aJXv7+3DkA3HLmZ+Q8ZyTMe7qoeu1CebUwOCntvG3sUgDOX8BeuiU+TmdaQ/+9qEFOToiaol9t4RfX33RRGCnJNTg4s5eTY7L/vPV1TTSvhZfbNbni/M/PSrNqkcHTvtqP0r1K5crsXsh76ZPIFxQrX9xERpYQl20Wxv2urf8CwIVJWXHWJh/bmejnXNV3f7pNfBe3HJTrlitUJ4krqv8omiMCAgMPCAgIaFZsGAaedDVqjJoiwNmM2NTs6dseaywPynJhpd2+PnoVA1NiU7WIQqergs1toow5JyMpTjMpse3mqLC56un880lRg7xs3yMr2k5TBnzimDC2wWEtMqGfW0TrYS283JuZ4EBeol3PqNb45LCwdctNEUH3YbZ/UePUV88Ys8q4mgjGCp9GIi0sbXub2L6vTB8BYHM8W7VJi4NuZXdZdwiAEx1iEx6eFpZ9sCTnkB9VtlaCsQlpw4FxUeD0d8r1mqtMWl/c844ddwLwoSFJpDWqfoAj2kenk89oG5duh55Uonr36LnkzI8yhwjJ4gZGVaVyrNDBpanqPrXyZV8blzF1RUbaeLWmyV1pBl7UZdQDU3tk/3G9B+3axj1trXJ/vnnLXQA8W1fMbU6ubW2isc5YC/s0VfDPd0sE6COb5EodGNtmB5bD6HFmSglyfgOFMK8jFpzAnXPnAJ8BtiAL4Y957//SOdcD/AOwBzgEvMF7P7TUhtgFmSyoU0cvlk2SHTqBv65bqoZbEMZqw5yCeR+nf7Stqm1xlQ1aHuvkPMm1VjOQZ7yU42BBJuZT49LG4lS1JMsm8nxS+m1wOsvTMxKGfvMzkhqg1nRicOlqB6zUMFyk+arChGKZBd+49R7dnwav1JMmqnnAsvPd1C4mqLyOl2MjYvLIm0mn5KIHrCWmqq38U4veeCtbExK6vrNLXg+ekqRZtakPNmmAzUIZD+shFr36yOG9EHwFkalNgHWvBtL8cFheM90ywZ+bPACsvMTOTDWWwmJYk7lF92qixI5OeSibSTRbI1WtBzPHbYqJSagtOR3tD8Cr6UStqRwc2wTi74ySpy3levw0oJGzLgC/672/BLgG+A3n3CXA+4HbvPfnA7fp+4CAgICANcKCNNZ7fwI4of+POeceA3YAPw/cqD/7NHAH8L6lNiRnecBnykthaaFWs+6UdKF9MVketsWql9urhRkvz7gD41ui3NTGJpMqbdyZFdbWnagfXl65TbFGHrkSmPRF7hqXoIqZwvyXtKRytdHpDN/pFwejLdNnmU4UqRZhdr+6R8KsFxNAFZ2vI+oDMzF16bXsnscZbcyqzQnbO0dDqDerUyyl8sjK9AUmybNgmHyUGnbudm/XepxWpeegfj4wLiulJ/NC+fYmBoEys1wMLA/5azbt554Tu6rbXYs6BN2Y94eOiJnniTPSJpNAgqShuCH71KxtVwLmoM7E5PolbRxEKZ8dBXVsWjKwSS/3xkIpfgEyuoI1oUJax11Og7gstfS57Wc4U7L9bfxas6uJRa07nHN7gGcDdwNbdHIHOImYWOpt8y7n3H3Oufv6z6yN3TogICDgbEDDhmTnXBvwJeA93vtR58rMwXvvnXN1jXre+48BHwO4+vLMnIa/kjLdSJpUw0wKJXn6l1h5BlsP5YoyYlM+k2ulOFPNUNuz8ptXdItdNreArRXKNs1GQukrK4RXojbseaLkowIYhcIczh3t+cipWYyzp02CLCyAYi6YTLJPCyAsxpGXrMPWzYF3UG3wF6XEZtvTQH3LmHKOPUmpGh83W7IlxqpIjFRcRGIqY8ev7RNH2v4TIlOc0JSp903sBeDK9EkAds0ho5wPVkiiKzZJi65CxhexvflNnhkVR+7YoLyfSGniKL0FH+8TLtUXP7GiwT9pJ9dtT3IAgJakOlVt8VNykU/itvZLqn7baIpfgK1pGWfRilcRT8h5Ds+00BUzKeHyUkQ3Oxoa4c65JDJ5f857/2X9+JRzbpt+vw04vTpNDAgICAioh0ZUKA74BPCY9/7PK776OvBW4P/V168tpyGxWhmhMStl4gfHRM6V27I28qExLQn2vXEJ4RgYb43YqyljOjPCAqw01xmE7cyxGAHK7NOSJY2UhNF1VtiBa+sU1tampJHalK7mtaZ82fhUOqo7WprLDptUJYHZrNVO3Ej63qSm+k2oaoOYj449My3b/8MxKbc2sEXUIr/QISW8tsTFBppx8Ui6ZjAlRF5D+VtTmr628mematNjzxpbdVDJjgHSalfOaUrT+wbFZv2GrnsB2LXgHufGjI837Aux5Fo/zp0TnYdVcbfFm4Wan1Hp6N8elbS/F537xVnBP8uByRJNhZJW9ZXTVZbPx5ickLbdeWxv1bbv7pMUv/tUgRTTixR3sUhJYkq0h0ckGK2Yr+GXunrNFRPkdcV+tqpPDI2YUK4D3gI87Jzbr5/9V2Ti/oJz7h3AYeANq9PEgICAgIB6aESFcifMaXj+2ZVqSFzpRGRXs+PXhBgX18gGPqz206cnxD48MZWqSIeqZa9ah+tu6/3cbcxrmtwv918FwHN2fh2ATiUS90/PVFUIh3KhWWPitYWZ22OOzSlJ22l2+aLpvq0kXb46Red0LskPDol+uDiH3TymttVrN/9E9h0F5zTCwGXbDg1RPxYjCsgoKmO0wKH7UlJ2bUAL4b6882EArkwPckjjAvbn5Dc7kqIC+frgswE4NdI+69i2QsrqWMosot1WKNtWUZbIazJvaWZXpoBANEYWkIObNv9vH7suatN0zoKAqvdRmJSxZSkPxGa+8ioNU4tkNQAnpkqxYgFK2oZxVYk81ir2+Ic7JShne0JWnG0uTS3Mxn90TDJVmZrIztPiB7KJPD0x8w8FG3hAQEBAQBNiA4XSy9M1spmafFjZVLfqc+MNVcJdOsweN6a69MmCqjtyZfaWaZWnf16VMcYGzhQXtoE3AmPaPTUFdSMbeA0yLs6r2x8CYGCbsNjbilJibFjb7WtWLqXpeLS68dP1Gbh5/WsxXlq4qKyVu4tYWqpIKV9d1Hha7cuPnhCWNtgtfoHepGgzJkppnpqW7x6f2ArAk8OS5vXMmPR1bszswWW9eUwLRWxp0VXJIhi42Zkt6tRCi8fVFm1qqbXSH1sEby4Xn+UXmnUr6Od51YtP+BSTJfHPzJV8bClI6rnvapXeeapVolbHcq1ltZOyZ9PR3zsuq71WtZ/vSci2sYqTeGRadOy16Yxt6Fr06i/0PhCphs52BAYeEBAQ0KTYMAzcEHc1DFzzIWQTFr23NravSY0ks3wavkJfbAqF67ueBKA1Jt/V2k/rwaI3X9d3P1DWH5vufKyU5Wd+8VEAHvqiaGmNeb9jDvVJWyzDXmW6r+oUTfrd/XsAqG+lF/haL/+sH8h5HZsSm+TXxy4HpJjGOXFhyb1xsXXWJk7q0WRjl3ZIrNeB1s2MG9O31K8aYVdQZjw4IQz8m8flvL9RelbEJqfzWs5NU/kW65WtQ0rvtbdJTo1ru6RowKZ44ysi0xe3parzcZgaxAosDBSf0fNfvM56R2I0YvjDLdJPc0XBVpWhm4t51yCvEag/mtzD89IyzrLLuF9sbPYXZeweLAjjHsnPE12p12ViVNj0Pz0lCbfu6hB1ys52GZkXtp3ikVGxj5vte66cPFnts03x8SrV1tmMwMADAgICmhQbhoFbdjmzKxvLMLvXnqxEDSZXWYQy7oV5Wfk0K21F0UW66HbNjHhJ5hhQTqDfp/rlrhZhLEMtbbOYlbF30xvn1eb+lNra9+d28Yt9knHxt9/97apt59N9m43zHC3VtqtdbIzGaieVCUXFgQtutka8BkVlv6emZBVyV0HsmA+P7eDqzkMAvKbtx4AoYaAcpdmp7XlR22MA/FvHuRxUjXDJV7Nna9PEUEvVK5QZsMFUIfWYN0CmbZort0jmwEvTcn0WU3qvKybX58puSXH75AmJFp3RDIc3n3iufL9Hvl9KqbW+eCnKK/NHI68EYHouBm5YXddPXViRkIOqBPrUwI0APDYsfolhLZwxqel7q9potnBdKU3r6zH1WZxuFV/N/aWyor4wVe0jqS3Bl4qHVBy12DATuCFfU43bRek75XVhN9QSj6t5v49qXqCbT8mNOl3hvExqVZvzOyWM28wISddWta9XbZVJ7X8ev3HB41oe7///+MsA6ElNckP2CWBxlcINnTqR/vpWCZz4XFLSod7+E0l2NT1T4XxcYFIoqcnj0EkJourqlBs6GS+RK8rQuWdYlsQf3HELQBQ4Yg+U3fZAaR3iRFZkg5MFfZjUVB+vDIOPmjhXdXbbRs0jaXUs7+geoSUu5jZzSNYGA80HC+fflzkl+83IviYG5EE4ZBK9ZVTq2RxvjdISWJqCaVZAnqhkx4QAF2eOk1lCTnAzmdjE/YeHb5L3AxpMp87nWQ/TRrpZr3GUv32+7XTiNjlrOmFpmws0s/Gg0RQZjaB5eyEgICDgLMeGYeBFfZaUaoJgSsrATk4Le7MAm7mqqywVA0VxfN06LjUXHzkpR7AwZeKe1hZ5cr6uV0wcVmSgFv90QgJw6jGLKQ0j/7HWeHwyJvK4p4bFMdSZyZHbtPTLYk61vJfVwfa0OIsaLSBQBWVLJQ1bHzojppRUdobBETnOjl7Z/5GCmJwuSE5W7cJMK6/vvZdnJsQB+PT47CCOOdFgsy1Aamw6zeCMsONHc5KQ6qLkEw0fzhI2bU1IYYKYyVq1HTmVQq61nHBe1CxcUir/zMRmovNZDCyNxGcGXgjA0/0yNnMjNZLN5Zh16m1r56FjNaaFRLb2yrXY1yEr363xSaCtdusNj0ZSZCyWhQcGHhAQENCk2DAMPCrUoPKteEaevkW1wz4xIs6ksb7VsYL3l6Qrvtt/EQDTk9XVwl2myLYOsVta8qraYJZjBVkljFipqdxs++OM2tQ/9uh1QJkZG4NMxosNFSCYC2a/PF6UJ/nTk5IKoGD2SmPixUV4g032p29nKmReFqhxsiBpRH80I8z/Eg17tmINPfFxOlOyyrGq9H4BJ2pD0POwQhXjuTSnUrJSeCYrNtv+rBWFEAY0X1CLySFj2vfJGsdZSVeA86UDnsvGuVulsP1Fx5MzErRi8sQlwQJcssKYzV6/p1PSDfTFx1gKR7N7YWBGrm3tvTBXO6QxNcFGhvmusdq6nQZgZdtkDuhplTnhRVtlBfWGTln59jZZEE8t8240RUYjCAw8ICAgoEmxYRi4BVu8ZccPAfiTwVcAMMxhNzcAACAASURBVDUhTbSyXxZgs1K2x5GSsMKDWjLr5JgG7liQixKJRKrIxR2SiGc5iXRMVjhLNqbHGcm0cLoobRgoHgcWDhYp+hIlpTgWbPGVkWsBeHJYizXMk2CrYURFIWJVSbEAPnfseQC8Z/d35Iuk2MZNljlY7OLkhFa9b1S1UI/ZGWrssBZoNTma4bjKHx9KqA28RfwNO9uOLHDAMiwoK15zXPNh9BflXE4Xn4mkkwvZOK/rkMCv749cwH2npE21RQuWglRarvm7Lr4TgJe0inSzL14ittji05SLQB8fl1VVpDYx6L3qTCWizDnTMhNJTxNqh58YadF96PWqZfGuzLy7ukTltF1Xur+5U5RUl6REQtyrK6eVTAuwllhsioxGEBh4QEBAQJNiwzBwYzGmj7XCt1O+mkE0UopsMRgsClP42hlJTzo+WeNpVwbW1THJZa0SIFKbSMcCHvqLEpSwHLtmoRDjC6eeA8A1u7/a0DYlPM8UZCWxf1qS4f9Yk+JbsEXBysEtp6iysaciEQvLqz18SkPdW9XObOqHo9qurw89u1w8uZbRGWoCN2KpYsQu41o0wGz55kuIApNKFhTkyBXlOAed2MAf7hC2e1VGGHgysbAt3IKyrGjHgIa823E//sz1AFxx/uc5UmjMxnnpOyWw6PB4D6NjopQp5OQWdOovqa09UcrOHbzi1E+0vVtUGsa8L00tPcx8oDjBsYKMnQFNGBZdd73mCbW5b+nREntZSRq2tWWMa9qloPK3ByV0/omUrAAHVMEU+YUqFjY2HiwFxRu2StGMK9Niy98cbz7FST0Y065dmc2VIqMRBAYeEBAQ0KTYMAx8IcxXJGEpMLXGYVWOHJ0QHXNhurpLzM7Xnp6mT6MKk1QzcIum/OwxSXVqipJFocLWujUjjGZijqK85cRXwoQGS3H2T0vBg08dlcjLEyNyXpb8yattcsVCsov1JSQW/WhqjkG1p57OtUVtmdWGGuadaRe7+fbuEa7tlWISFol7YFzCuI9o4qNBSytrkX0FF6lmLGLwByfPA+C8jJRt3dnx1IKn16eM/207/w2APzqjIe9DojCy1UR/sZV2TYBVa+OM5zTt6k1yDk9OSdvHZtLReHbj0k+Z0xrf8EM59xPXyP6nRYJNKVuMGHeqRfa/rVvG4/v2fBOALXUKSC8Wp4oxPnlUVhe5yWpGaOkKomRhWujjNV1SCm9Polyi+YqMrFb/IvYSAO7VVenEjK4OKv0g2uyMpploVR9T2xI07BsREbNWpl2rNlkK8zYEBh4QEBDQpNhwDNy8/xEsr80yiyTUol7RYqhgqtYezX+yq22Ii5IDACSdsIjTavt+eEZshsdGxGtfXCgxUT3oeWYzM5yTMR1v/XOe1Lwtt05KcqlHJnfwkwmx9xrzntSkQdH5LIacxZeg5V0Ak4VUWXFRs19jdq2dwuyeu01Stb51853s0HwzdlXGOmXI7s/tBOCzx0RtcxRNRTqZjBREVqhiZEKu19GZHqDs9+ich76YT2aHFh6IcpY4YeDm5zhdbCenxT9qbZzGvBO6enh4SMbJ6FSG4qhs09IvbdzzlYGq428TMRaHXl1moca833XJvwLwsrZHgPJqYSmpbWvRX2xlcErs82avdhoR2douK409XTI+n9N2EIDLVd/fGSvbqnt0lfjrW0RJcnTidXI+6kMojOt5VYypMV3VWNnESS/nu5xUuBsJy2Hac2HDTeCGFnViDqlpYVIvrlVzHygeX9KAtYo7g5r18MlxCRAat4xqhWqHjd24r+55kBkNn34iL5/tnxan5Z8/IaVBJ8ZqKok0AFsWW5Wfizad5hXtUhMyO8cSsl8n5UcmRSb3wOA5jE1L/0Qmk5p6guUD6us8ocxmNnIqBTMnU5X8r6ICDkChZKkQqkPMK2uYRs7LmmPHNVnRDTtlQjCnzoXJEm2xageWJR3bGj8MwObdYm76eEoqsT9xpo+xQXO+6TZq0jo4IfaIya7GH7DmzNzaLscZHtbgFp2IPnn0ev7i3H+savfb3v0DAO6dkgfsbQMSHHZCa4DOTCdxWqN0279N1z2umVAqn7wtabknnqXO2OU4K2thZORYYVe5Io7CHhwv3Cn51V/dLbXNTd6XrDPBWpCbJXwzeemf5sUUdTyvdS9z8Wgs2dj9ar88AK/aKU7fpWR8PFsQTCgBAQEBTYoNx8DNCRbxNv3HWNTnT0qa16t2f4XuKNXs4p9DwxoQ1J8ThlecmS1vgjKzPJDbxoQ65GzZfuSMsAgLLa8XOj8XjHl39whDed+FtwJwWepEZDrJzlF3ckJD7fcPiRnhSH93JBOcK1921KGVZpna9K3KvI0RZ1qqQ8KNIRWmEpHT1VIetKeFSRpjRQNI5q1hqm1pycq2+1rEybhPw/Db6lRdMeeomTguVxZ4Zaew0qHpLOOjGjyiJhsLs7drPeEbH/bmGHzz9rsB+OMBYZA5dWYO51o4VZT93tgiv50sSb8dL4j5ZSwvv53RIKDiVBynTkxj2mYyKTsvZV+VMkILIrK0BaeLh6v6YjmwVd3Hn7k+qohjY3Rzp4zRN226C4DLU3J+LU5Ww/Pdf1axyeSlL9giDuQvDolsN5+LR/ecCQhOTMpK5WRR9n/eT4cvc1UQGHhAQEBAk2LDMXBDlETIYkfUITUwJWzn6Xw325TtZV3jzoEpL0ygv6iVtKctuKS+3XpKA3s+e+C5eGUKURDJUpyVCrMr/s4FtwFwY4uEzc/Hpixg6EheUt2eGpO+yE8mZ7PpGsZt4coRSq5s01YktU09WrjB6lEmlYWaI3nSO0p6PGPPv3bOvwBlh5rBbOC1aYIBYsrwzu0Wp9gLWx+X49F4v7ZrAYZ/3ym1H8/kWzkyICuj/JStStQpZnU1feOUrrfGmdmidRlzWoCh3nkZiuoPmC5oCtp8eXXk9dxNJlh2Vs5m3gZb6X38sNj7r7pQVh2bl2EjtjH1iDriT420R+M63S2OSKsedH5SnJUt6sRfzMr3WSnxIdydkH0kdJVVGVRu49GCwiZKlj5D7/MmDaFfTQQGHhAQENCk2HAM3NLKdqfl9ZiyQgvZzimbOZTv43mZ0UXvf1Dlg3eMijJgcFRrXs4RYm51+grjyRWtS2iKAksd0Igdc6QkDbBybzMWMFRP9aLMO6nqloymGrV0qFOTqXIYuiXQV/bclRGWZPVJ03Hps+1tso+BljZODoqdckenhHFvTQzXPQ+rdTpdTMySMjpt9uVdEvTRFbMQ98ZDp03tEEO2fXHno3w7Ldc2jyo5jNnNyBhaTjGGWnVUsc64MTv9JlVgFJWl11vlzRcqXwvzsYyrhNHSF1+ays+5zUKwMWW+pcrkWrVjdFNs8czb+sIUVfvSUqrOkl1VwVa4BdnmPpXJXpEWeeZPi5xwJREYeEBAQECTYuMxcGWOv7xNPN5/OPwqAIaMgWtJq/Fihmm/OAY1Xsrx+IyU9XpwUDTU+ck5wrsL1Trnedn3XGbQebZZSmBSTtlsQZlxVch+TTkqs3mnVFHSnhFbtRWNnpmJU4pXK29KyiZj2rbXbHsIgEM5MdTGlbHuah3iwZj0346sMPD2KMVutXLGilOYXbMS5lMwLb7G4iwJZh9NUiSlxW/nwlISomVj0n9RZXSlPtP5ZBSbMF6SYJzaQh8RGkkkthJFLhYBU3qMaIh7frJ8nWzlkF+ET2IupJ2Mg/aY2sBjdfrewgz0uEOF7LKP+9OOwMADAgICmhQNM3DnXBy4DzjmvX+1c24v8HlgE3A/8Bbvff1aUovA4YJqXfOidTXGaGW4TIv8zHQPE2q/a5QgnCoW+NLg1QAc7RcmPku9UYtKJlSr7EgIi0hoytO0vlpS+9xkallKlblwSccJAB4qCQsm5iuiJ1UbrzbGNmXenWlRFHRp2PNjhS0M2+ojirCU10xcS3OlhFFmnLy/TCMAi8R4ZeeDQLmwQV9sftZb8m4WAy2pdn1UddJnVHVw3rx7qg9L8DXp26IxE6EmHcOsdA0NIKPb7GiVFcfhuKQumM4lI3/K5ak7AGhTWmSa+GxS+s9Kn5UmKm67pPoqWqrt2GaLrjd+ViKxWzkhmtxnpsaqHO9x7a/8InTzc8Hs5js0IVxnixx/uKW1fI56Wkkduz/b8SgAWRdCMefCYhj4bwGPVbz/MPAX3vt9wBDwjpVsWEBAQEDA/Gjo0eqc2wm8CvgT4Heccw54MfBL+pNPA78P/K+lNqS2HNXdfyuRWhOXKiVQhllQJn50souxBpmBaV0fz/fyb8f2yn4mddu5bI01NmVcmd1a3hJjQu1ZYRNv3yupR01n/NFHbyA3BwOfzAn7rSzNBfOrUUxT/cCQFCiwVQkJT0YLwVrhg+6sMO3zO/sBeH2vJMm/bfQSAA6nuxmurVKmCpWMqk5Mx/y6dikqa0/7pIsBsv8kFnm5BIWAHn8oJ/ZX6zfLV7MYtYMlJ/vB6AXRedT6Jlo0XWmstmpCA0hakeNYdXxCIZfggQG5Hv09wmL36jamif8P54iO+veHXgPA1GR53MY1UdR1eyUPzI6MqHm++MQVQJ3SeysES4h218Q+oOxbqoRFIds4sBiKNjeHjb8BWJ+8c5eUf/ujoVdG5+j0u10dQ/pb0Y63LCLO42xDo3fIR4D3Uvb6bAKGvfe2bj4K7Ki3oXPuXc65+5xz9/WfWfzSNSAgICCgPhaksM65VwOnvff3O+duXOwBvPcfAz4GcPXlmbp89/7pmVnlqLqnhFG2npYmHnm5/lij2U5PtkeqjIVgWtfPnnp+uajAQhkDayIYU5k8mzqEbbanpG3GVDel5fMXZqVobUnp2de6LufwZP08KVFUXUVpLiAqz1UJS0NpLLArLex3U7fojFPxIh1q4+5IyetW1ci/fZMwnchGrXbF24+dX159WGFgU6No+00xYIn1TUlQiYVYsik+6umla1c/YyWNblRW38jVNbbeX5K2DeZbo0yBEXSlYoxyMorwE4bXSISf9clUsWbfBceE6strC27bamprQuzmve1yXkfH01HUYUe7xDtc1CoFs/emJR/MLRlZKU0zm+2uRGpl8x9NFjV7ZR27ekH12D8cF6/EDS3ie1lOgbPa0okt6Xx0js5KtsWsKMjScx2dLWjEBnEdcJNz7pWIRqwD+EugyzmXUBa+Ezi2nIbUVjMppmXwDL1VJilXU916LJfmeEEckSNJMRN01kl+BGWpVP9UW1SDcJbsrsYJ2KJ5wDvVFNHXMsF5bXKc13RJOk2Tlm3S186Yho3rzPTOXXfyR0NayaVmArcJfWRKBu83xn4GgMO5TbNq5lklj93axt/ZLomvvtchN3nRx7gye0jaqcvOHpX1Wf3OuF5qM8PUu2FtUjmTk5usKy6TiwVjLOVGmqu6eyVs+T5clOOOlCRBVXespeFjWiX1Z8a7y4nJDHqqFvb++LSkIniBVj5qBO1qOmmNS79GoeC5+LznBuXgtN4WGcvjPanoYXLBJhlTlkL4SEG0lNEkXUdWaLnIzfw2oOa3xaRXbtWx2p2QtlmA0njFsDCZ6mhBncw6PpYSum8P2jOlKW271Y8tPxCTKSEZlj+93c3vGA9owITivf+A936n934P8Ebgdu/9m4HvAb+oP3sr8LVVa2VAQEBAwCwsRx/0PuDzzrk/Bn4EfGI5DamtZnLtG6XO3ku0Os2XDl8OwOAJkT1NTKajSvKXb5OagPNVWAFhYImMVjlXqZIVbEjp+62twsp2ZsWZZMnre+Lj9MUsZap0Wzlgo35ypL74aLmSS52lMMDwoCxIvxQXp1X2r7pmVTO3GnrGsDJOzCT51gOAyPrKVVFsFVJtFrCE/XeMSgWiWVI7iGSExlSHixZIkavb9sUgGS+WHcI15qtpDfL55hkxn3Wp/O6i5AC9cQvFlvOJKSU1JndSz+PuSVmN9I+3zgpZj6WrU95e2XJo0e3v0aRZr+yR4Ka7ju4BJBlTwWSj3kwp1aaZ2uC0z3AtL+0VU9YNWUmvamlrB4s1JrQ6fVbU/790SsbHDXsOL/p8TJr3/FYx+92WvhCAgbiP+s+Sbx0ck0CuU5tkrJ5TUtau12S+VZIxb3OAPpmX8fnRQ2IynakwdxV0JXZtlxSOyPlgOlkIi5rAvfd3AHfo/weB5658kwICAgICGsGGCKW/Kp2as2LzhJen/D8nL5UPlJHkp5IcGpMahw/0SCrMjJOUrJ3KfIwB9WjV8G3ZUeiT3Vy9SeyGaXXuGevs1HSXv9ghK4AtVWy7sTSkxnb7i7siueBCSHxmk/5XVupENvAapNWpaIn1AWILuPzGShaUoeH4hToBIqVqBl5SC5uVMUsuIaBiiyZ0ak9NE2+R/RTHq4edOR0fPCXXcXjmRQDsaR3kWa2S6OrclDj3zKZ+KC9ivU8fkeIaJ7Vc2fRUMgoYslqbHe1yTV+0WeSQVj19KUmzLEFVW4uw+SlaInu22davz4xUbWsrpwuTcg7P6z4UObwvSFavzCz4p69VXkdaZNtiRfCP1+PtbRVfwURp8UzVzqcnXp087nCyRFGTnBkTtzqrn+6/DoDkZkkd/Cx15ne6uUu7mT+ovyj32Z0TssoanNC6mxUySXOif/m4rKxv3PfEos/rbENYowQEBAQ0KTYEA4e5Kzb/a0490vq0jx45JTimifv/LinMoLhdAmlelj1ZtQ+zQf7u9lv54ZRIos5TRmcYVQnbNRkR05jNc87ERPOgsjzVTK2krQbGQIZfL8wufXsH1/6KMG8rklvbN8aEF8OIrdviDSRwsmRWMQteWUYos/X9f9j5A/5gVBKTjU0o81WmbEFVE2pzfSInya2Ot3ZwX0KCZGxVYHJEU2IYey9VBmZFxZml/Rf2yrV+UZvYnXuXURjAgoCipFaU0ztYmlpLWdxWQ4+sL96hKiaYbT82W/jbd4r88/cHNfhnolzGLq2pgdsSck901UsM1SCs5F1nUlapiWSBYkyvt9raJ4aFYd9d2g3AyMzLAHjzVikzd1HqVNT27pqUs0Masn8gLyvMg1NiT58l9QxYEgIDDwgICGhSbBgGPhe2qo2uNyM2wUMaMj49lo5SXx4bEWXKI91S5PeqtLDoNmVLleqNfUmxq1kq2urwcEhSbT9fDtLxQhTuPpdSOL1J02uqCuZPf+cTkZZ7rlXJUhBXVmrsum4wiH5kCpVMTJMwzZkvd2H0RsEsw2xqlWs5FlO9sikr9NXr1Sgo2x4tZcvBRaajt4CgeVIgOFWddGlpuH2torU+VzXPi7F9L4iKItH3jOwB4BfaH6r700Z02vabzToGIlt7LBOVoOvS2IS9aTmvbGzpK6Qe1bdvU7t9a8t0uUi3KZXUFp4bl0Clh2ck6PpDA7I6uKhPGPgbttzLnuSA7ln2e7ywBYDPnBBfxVODwsArtfpOfSOdHXJ93rdHVGW15fkCZiMw8ICAgIAmxYZn4Bbd+Mub/xWAgdxLAXhiagtebaYjI+LRvvvMHgAuyIgNPJYRfexs3fbqwuyYb9nxQ/5k8BWA2jDnwfsu/jYAV6YHGyqvtlhktX7ZtpQwrUwqz1RtUiIltzPKvHIlYWJTXtngMpIYJV2xnMTfaENtahxj4sqyi5V67rlSH9SkPEhn82ztkjDtl28Vm/dN7ZL61gogrzQsfXCnFv1dSf1yrCIi02zf79wj9vEXZyUBVnIZpcasT97UdQ8Aw/ksd+QlwdXEtMYBWKqFaV0h2eu4jI/9U+KnOJNrpSWhJRC16IilqbXShfmJmlQWDrK6qn7/Rd8C4Ip0/fJ8AbMRGHhAQEBAk2LDM3CzCV6UkhSTl3ZKQp1TY+0MnxFbphXnffqkiLw/XpQEUU9s3grA27rEW16rCljtNu9IDNGhqWanhpW91miUN3eK+uSilJzXarEOKyp7iapsWlJXMFTzG7PXG1OuLSq7XMtxuyodrHhBXiMXZxXVMLPyfAnHzKafqbZ337D9IK/tkvZemNTiAbGV82uYiidixh5iUbIsYZ2pJaSrrYWpXVq0GERLZ45dPXLFnqMry12J5dvybVW6NyExBb/S+wMOjoti5AlV10Qa9Fq/gwWJqpLq0JG+KCWsRfX6Gj9H7T5cusSWDlnhmf08MO/GseEncIPJ+l7XfR8Ap6Y7uGdSZE35MRloJkc7fEIG4L/pRPSqDpFt7WVtEXOlKHz7tE7YZvaJa63KF2yRUOquqKbk6uQ+tkyC5iBtT01HZgdfL6x+hZFxRfoy8rDa3C1tGE6L5GxyVCaRWVXbHdGNb9VsLOFRR6s8DKx/93WIQ+/Gjsd5tmaH7FxJZ2UNkvHqvOArjU2aCXJbVrP2JfK855zvAKvj3LOH2+7EBO/ffQsA//fMLwBwXIuV1gZgRYgmafD5BseSbpJunYmunaWqaDRgLiCYUAICAgKaFk3DwG2pd2FSEx21neSRrJhIRqx+oLJbM6mYA6U2T/NaoSuWY5PKH49lRepold97Nbf4xS0S/m9Jm1a/TcL0r+w+wsHTslLJz1Q/x82JaSlqV6ImYVeswHUdEj5+bYckK7pnTEw0xyelb05OtAPloJ2Y89H/xrTffo44s3ckxJyQ1JSjXVGisRLJNWBwUTKwCgZuJqKVYEUm73vbFnFYDpeyXKZmxM3x1VtZ9MZbuUTzsf/hPkkw+t7CvwPgjJfjlhZwyDcCW1Ft7RrlHX0SsGZig4DGERh4QEBAQJOi6R55GWWDL2h7nP09ErhzNCk2usmZauZlrM0SBMHcSXdWA32xAi/ueRwoS8xGNJ3m27YIk7wkJQmJliMFawQW2mxMf3NqlLas9M+whTWrYy6TEHZkBStWQn7ZG0vxujZJTGWh5i9sERlcv66QLB1rqY4ML0ryFK+udlPmIKtzba1Wq+HeqQuAChu4LxfHMCfmSqzzeuNyPsa6s7HRVR8j5WNL316SEl/FH1/4VQA+fEgksSc0cdiMFmOY08lZgdik9k2LpmdQX8aHzv0nzlUHalssOC8Xi8DAAwICApoUTcfAjQ3uSYxHSfF3bbO0msLkbhmU8mRv3az1INcpJLczluK1bWL3vaFF1CatqoyxwJreVbRn1kNS085elD4RldEaU0VMizJyK/1lpcAgy3JRKeGrlXPOVgfVu15ru3oy5m21Wq24RsdLJUjs1GBH9NuYsvEJrS+ZXAFliqUv2Bwv9/1a14a0VY4F1vyPff8AwCMz4nv6zPHnA+VUFmPjLRSnq/0lbkLepwd1BfiQBlz9J9lnzicXVQouoBqBgQcEBAQ0KZqOgRt6Ygl+qf1g1Wc5LTzwwh0nqj5fryd8NpYiq3bLpRSCXQ2U9b5DXNAl+luzefcp8/6P226X9/F5jJo/pahl3o9+SAoQWJm76ZOS6ta/SjXLFV1kleVXAhupEvvmKMGWvN8SF+XUZXu+BMC/TJ4PwN8+fR0TU7IKnpnU1cionMeeL8sqOd8rK4pT/yzh933vHmO1Yh/OBmycURIQEBAQsCg0LQOvp4xYW2tyc6Mv7nnv1lsB6C9V27gvUq392WybNJu3MW9D7JelOETptCifKDkSWjhisiRMsnMZ6V2bATYueqPTFD/Pj7bs4shENwCDUzqmvivpY2c2yzaFFtlornKBAYtDYOABAQEBTYqmZeABy0NvvDViUPt8bV7XtUm7u5FhDNGYuL3fnRFb7hfyUnh38ERnVIzj2VlJMhVfrQQpGxTmK3nvlu9wsijM++Gc2Ljv+PULAXj0SxcB5X6cq1xgo6jV5xtWsghKMyAw8ICAgIAmRWDgAcsqWvzThojBKUN8x7vlNeflVhlWf8HnS1dF22SSlo9FdPNrVThko6DSJr5H/QHXpI8AcGXLIQDi7/5W1TbLYcr3T8/M0udHNvVlMvtmQ5jAAwLqoHYCOFEQieVnRsQkMKVh5CQ8rSlZzmdDOtRZ8sfnpldukV8p8ayVd9pEbg/cswXBhBIQEBDQpAgMPCCgAQxqoqofnJF6kTOaACyWKnJuuzg2WzW17dnMwNcCd9585Sx559kqSwwMPCAgIKBJERh4QEADuDQlybSeGZYAHq91IH3ck44Xqn4TsDowv8RD//mvedn2KwC49fj+ml+dHc5LQ2DgAQEBAU0K5/3aJSxyzvUDE8DAmh20MfQS2tQINmKbYGO2K7SpMYQ2NYbd3vu+2g/XdAIHcM7d572/ek0PugBCmxrDRmwTbMx2hTY1htCm5SGYUAICAgKaFGECDwgICGhSrMcE/rF1OOZCCG1qDBuxTbAx2xXa1BhCm5aBNbeBBwQEBASsDIIJJSAgIKBJESbwgICAgCbFmk3gzrmXO+cOOOeecs69f62OW9OGc5xz33POPeqce8Q591v6eY9z7jvOuSf1tXsd2hZ3zv3IOfcNfb/XOXe39tc/OOfWPMTMOdflnPuic+5x59xjzrlr17uvnHO/rdfux865m51zmbXuK+fcJ51zp51zP674rG6/OMH/0LY95Jy7co3b9d/1+j3knPuKc66r4rsPaLsOOOdetlZtqvjud51z3jnXq+/XpK/mapNz7t3aV4845/6s4vNV76clw3u/6n9AHHgaOBeJdX0QuGQtjl3Tjm3Alfp/O/AEcAnwZ8D79fP3Ax9eh7b9DvD3wDf0/ReAN+r/fwP8+jq06dPAO/X/FNC1nn0F7AB+ArRU9NHb1rqvgBcAVwI/rvisbr8ArwS+CTjgGuDuNW7XS4GE/v/hinZdovdhGtir92d8Ldqkn58D3AocBnrXsq/m6KcXAd8F0vp+81r205LPZU0OAtcCt1a8/wDwgXU/efga8BLgALBNP9sGHFjjduwEbgNeDHxDB/BAxY1X1X9r1KZOnSxdzefr1lc6gR8BepA8Pt8AXrYefQXsqZkA6vYL8FHgTfV+txbtqvnuF4DP6f9V96BOpteuVZuALwKXA4cqJvA166s61+8LwM/V+d2a9dNS/tbKhGI3nuGofrZuijl6uwAAAvJJREFUcM7tAZ4N3A1s8d6f0K9OAlvWuDkfAd4LlPT9JmDYe2/5Sdejv/YC/cDfqWnn4865Vtaxr7z3x4D/D3gGOAGMAPez/n0Fc/fLRhr7b0cYLqxju5xzPw8c894/WPPVevbVBcANaor7F+fcczZAmxbEWenEdM61AV8C3uO9H638zstjds20lc65VwOnvff3r9UxG0QCWWb+L+/9s5EcNlW+i3Xoq27g55GHy3agFXj5Wh2/Uax1vzQC59wHgQLwuXVuRxb4r8DvrWc76iCBrOyuAf4L8AXn3IavTr1WE/gxxOZl2KmfrTmcc0lk8v6c9/7L+vEp59w2/X4bcHoNm3QdcJNz7hDwecSM8pdAl3PO0v2uR38dBY567+/W919EJvT17KufA37ive/33ueBLyP9t959BXP3y7qPfefc24BXA2/Wh8t6tus85AH8oI75ncADzrmt69gmkPH+ZS+4B1kN965zmxbEWk3g9wLnq1ogBbwR+PoaHTuCPlE/ATzmvf/ziq++DrxV/38rYhtfE3jvP+C93+m934P0y+3e+zcD3wN+cT3apO06CRxxzl2oH/0s8Cjr2FeI6eQa51xWr6W1aV37SjFXv3wd+GVVWFwDjFSYWlYdzrmXI+a5m7z3kzXtfaNzLu2c2wucD9yz2u3x3j/svd/svd+jY/4oIiw4yfr21VcRRybOuQsQp/0A69RPDWOtjO2Ih/kJxIv7wfUw+APXI0vbh4D9+vdKxOZ8G/Ak4onuWaf23UhZhXIuMlCeAv4R9Y6vcXuuAO7T/voq0L3efQX8AfA48GPgs4g6YE37CrgZscHnkQnoHXP1C+KQ/isd9w8DV69xu55CbLg23v+m4vcf1HYdAF6xVm2q+f4QZSfmmvTVHP2UAv63jqsHgBevZT8t9S+E0gcEBAQ0Kc5KJ2ZAQEDATwPCBB4QEBDQpAgTeEBAQECTIkzgAQEBAU2KMIEHBAQENCnCBB4QEBDQpAgTeEBAQECT4v8A/6NOiLRUEQIAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(cv2.imread(directory+images[0], 0).astype(np.float32)/255.)\n", + "plt.title(labels[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "charset = '0123456789+-*()'\n", + "num_classes = len(charset) + 2\n", + "encode_maps = {}\n", + "decode_maps = {}\n", + "for i, char in enumerate(charset, 1):\n", + " encode_maps[char] = i\n", + " decode_maps[i] = char\n", + "\n", + "SPACE_INDEX = 0\n", + "SPACE_TOKEN = ''\n", + "encode_maps[SPACE_TOKEN] = SPACE_INDEX\n", + "decode_maps[SPACE_INDEX] = SPACE_TOKEN" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "image_height = 60\n", + "image_width = 240\n", + "image_channel = 1\n", + "max_stepsize = 128\n", + "num_hidden = 256\n", + "epoch = 20\n", + "batch_size = 128\n", + "initial_learning_rate = 1e-3" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "class Model:\n", + " def __init__(self):\n", + " self.X = tf.placeholder(tf.float32, [None, image_height, image_width, image_channel])\n", + " self.Y = tf.sparse_placeholder(tf.int32)\n", + " self.SEQ_LEN = tf.placeholder(tf.int32, [None])\n", + " self.label = tf.placeholder(tf.int32, [None, None])\n", + " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " batch_size = tf.shape(self.X)[0]\n", + " filters = [64, 128, 128, max_stepsize]\n", + " strides = [1, 2]\n", + " x = self.conv2d(self.X, 'cnn-1', 3, 1, filters[0], strides[0])\n", + " x = self.batch_norm('bn1', x)\n", + " x = self.leaky_relu(x, 0.01)\n", + " x = self.max_pool(x, 2, strides[1])\n", + " x = self.conv2d(x, 'cnn-2', 3, filters[0], filters[1], strides[0])\n", + " x = self.batch_norm('bn2', x)\n", + " x = self.leaky_relu(x, 0.01)\n", + " x = self.max_pool(x, 2, strides[1])\n", + " x = self.conv2d(x, 'cnn-3', 3, filters[1], filters[2], strides[0])\n", + " x = self.batch_norm('bn3', x)\n", + " x = self.leaky_relu(x, 0.01)\n", + " x = self.max_pool(x, 2, strides[1])\n", + " x = self.conv2d(x, 'cnn-4', 3, filters[2], filters[3], strides[0])\n", + " x = self.batch_norm('bn4', x)\n", + " x = self.leaky_relu(x, 0.01)\n", + " x = self.max_pool(x, 2, strides[1])\n", + " x = tf.reshape(x, [batch_size, -1, filters[3]])\n", + " x = tf.transpose(x, [0, 2, 1])\n", + " x = tf.reshape(x, [batch_size, filters[3], 4 * 15])\n", + " cell = tf.contrib.rnn.LSTMCell(num_hidden)\n", + " cell1 = tf.contrib.rnn.LSTMCell(num_hidden)\n", + " stack = tf.contrib.rnn.MultiRNNCell([cell, cell1])\n", + " outputs, _ = tf.nn.dynamic_rnn(stack, x, self.SEQ_LEN, dtype=tf.float32)\n", + " outputs = tf.reshape(outputs, [-1, num_hidden])\n", + " self.logits = tf.layers.dense(outputs, num_classes)\n", + " shape = tf.shape(x)\n", + " self.logits = tf.reshape(self.logits, [shape[0], -1, num_classes])\n", + " self.logits = tf.transpose(self.logits, (1, 0, 2))\n", + " self.global_step = tf.Variable(0, trainable=False)\n", + " self.loss = tf.nn.ctc_loss(labels=self.Y,\n", + " inputs=self.logits,\n", + " sequence_length=self.SEQ_LEN)\n", + " self.cost = tf.reduce_mean(self.loss)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate=initial_learning_rate).minimize(self.cost)\n", + " self.decoded, self.log_prob = tf.nn.ctc_beam_search_decoder(self.logits,\n", + " self.SEQ_LEN,\n", + " merge_repeated=False)\n", + " decoded = tf.to_int32(self.decoded[0])\n", + " self.dense_decoded = tf.sparse_tensor_to_dense(decoded)\n", + " \n", + " preds = self.dense_decoded[:, :tf.reduce_max(self.Y_seq_len)]\n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", + " y_t = tf.cast(preds, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.label, masks)\n", + " self.mask_label = mask_label\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " \n", + " def conv2d(self, x, name, filter_size, channel_in, channel_out, strides):\n", + " with tf.variable_scope(name):\n", + " return tf.layers.conv2d(x, channel_out, filter_size, strides, padding='SAME')\n", + " \n", + " \n", + " def batch_norm(self, name, x):\n", + " with tf.variable_scope(name):\n", + " params_shape = [x.get_shape()[-1]]\n", + " beta = tf.get_variable('beta', params_shape, tf.float32,\n", + " initializer=tf.constant_initializer(0.0, tf.float32))\n", + " gamma = tf.get_variable('gamma', params_shape, tf.float32,\n", + " initializer=tf.constant_initializer(1.0, tf.float32))\n", + " mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')\n", + " x_bn = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 0.001)\n", + " x_bn.set_shape(x.get_shape())\n", + " return x_bn\n", + " \n", + " def leaky_relu(self, x, leak=0):\n", + " return tf.where(tf.less(x, 0.0), leak * x, x, name='leaky_relu')\n", + " \n", + " def max_pool(self, x, size, strides):\n", + " return tf.nn.max_pool(x, \n", + " ksize=[1, size, size, 1],\n", + " strides=[1, strides, strides, 1],\n", + " padding='SAME',\n", + " name='max_pool')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def sparse_tuple_from_label(sequences, dtype=np.int32):\n", + " indices, values = [], []\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + " return indices, values, shape\n", + "\n", + "\n", + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: Logging before flag parsing goes to stderr.\n", + "W0829 22:51:45.737936 139691927603008 deprecation.py:323] From :69: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv2D` instead.\n", + "W0829 22:51:45.743227 139691927603008 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0829 22:51:45.988447 139691927603008 deprecation.py:506] From :76: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0829 22:51:46.013294 139691927603008 deprecation.py:323] From :85: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0829 22:51:46.680723 139691927603008 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0829 22:51:46.682112 139691927603008 deprecation.py:323] From :34: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0829 22:51:46.683474 139691927603008 deprecation.py:323] From :36: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "W0829 22:51:46.684638 139691927603008 deprecation.py:323] From :37: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0829 22:51:47.023829 139691927603008 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0829 22:51:47.727965 139691927603008 deprecation.py:323] From :39: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0829 22:51:49.237528 139691927603008 deprecation.py:323] From :52: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 100000/100000 [02:43<00:00, 610.59it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "X, Y = [], []\n", + "for i in tqdm(range(len(images))):\n", + " img = images[i]\n", + " X.append(imresize(cv2.imread(directory+img, 0).astype(np.float32)/255., (image_height,image_width)))\n", + " Y.append([SPACE_INDEX if labels[0] == SPACE_TOKEN else encode_maps[c] for c in labels[i]])" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_X, test_X, train_Y, test_Y = train_test_split(X, Y, test_size = 0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [14:21<00:00, 1.37s/it, accuracy=0.594, cost=7.13] \n", + "minibatch loop: 100%|██████████| 157/157 [03:32<00:00, 1.10s/it, accuracy=0.59, cost=7.28] \n", + "minibatch loop: 0%| | 0/625 [00:00" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "decoded = sess.run(model.dense_decoded, feed_dict = {model.X: batch_x[:1],\n", + " model.SEQ_LEN: batch_len[:1]})\n", + "plt.imshow(batch_x[0][:,:,0])\n", + "decoded = ''.join([decode_maps[i] for i in decoded[0]])\n", + "actual = ''.join([decode_maps[i] for i in y[0]])\n", + "plt.title('predict: %s, actual: %s'%(decoded, actual))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/ocr/1.cnn-rnn-lstm/cnn-lstm-ctc.ipynb b/ocr/1.cnn-rnn-lstm/cnn-lstm-ctc.ipynb deleted file mode 100644 index fafb125..0000000 --- a/ocr/1.cnn-rnn-lstm/cnn-lstm-ctc.ipynb +++ /dev/null @@ -1,351 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import os\n", - "import tensorflow as tf\n", - "import matplotlib.pyplot as plt\n", - "from scipy.misc import imresize\n", - "import cv2\n", - "import time" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "30" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "directory = 'image_contest_level_1/'\n", - "images = [i for i in os.listdir(directory) if i.find('labels')<0]\n", - "images.sort()\n", - "with open(directory+'labels.txt','r') as fopen:\n", - " labels = [i.split()[0] for i in list(filter(None,fopen.read().split('\\n')))]\n", - "images = images[:30]\n", - "labels = labels[:30]\n", - "len(images)\n", - "len(labels)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAACeCAYAAAAiy/EDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJztvXl8ZFd55/09tau0q6Xe3a1ut+22jWPTNmBjGwwJYXcIExgIL4EAQ5JJmJBkhmV435B18pJ534TMZ5IJBEiAISaEPcRgwMYBJ8YrbRsv3bab3jeptatUUi1n/niepzaVpNKuVp/v59MfdW33nnvurVu/86zOe08gEAgEzn8iqz2AQCAQCCwN4YYeCAQC64RwQw8EAoF1QrihBwKBwDoh3NADgUBgnRBu6IFAILBOCDf0wHmFc+5PnHPvXcbtb3LOPemcSy7XPgKB5SLc0APnDc65HuCXgI/Vee13nXPeOfczNc/f4px7e81zvc65251zg8650865/+mciwF4788A3wPevYDxOefcHznnTjjnhp1zdzvnrpzvdgKBhRJu6IHzibcDt3vvJyqfdM5dDLwBOFXx3Oucc++uePx659yv6MO/As4CW4BrgBcD/7Fik58DfoU66I/B4RnG9wbgHcDNQBdwL/DZBo8tEFg04YYeOJ94JfAvdZ7/S+D9wFTFc18D8sB/A94LXAz8rb62C/iC9z7rvT8NfAuoVNL3AbudczvnOb5dwD3e+0Pe+wLwv4Er5rmNQGDBhBt64HziKuBA5RPOuTcAk9772+u8v7KuRbHi8UeBNznn0s65bcgPxbdKH/I+DzwDXD3P8X0euNg5d6lzLg68rXK7gcByE1vtAQQC86ADGLUHzrlWRIG/rM57XwckgA8BO4ERxBzyMeD7iI18BIgCnwa+WvP5Ud3ffDgF3IP86BSAY8BL57mNQGDBBIUeOJ8YBForHv8e8Fnv/eHaN3rvv+K9/xiqyr33X/bef8w5F0FU85eBZqAb6AQ+UrOJVmAIwDn3i865IefcEPAosMMe678d+pnfBZ4HXASkgN8H7nLOpRd/6IHA3LhQbTFwvuCc+y7wt977z+nj/cB2xFYO0AMMAx/x3tfeoG0b3UAf0OG9H9bnXgf8kff+Ofo4hqj3y733R2o+3wvc7b3vrbPtbwDf8d7/RcVzQ8DPeO8fXOBhBwINExR64HzidiQixfhp4DlIpMo1wEkkOuUvZ9qA974f+Anwa865mHOuA7F1P1rxtucDh2tv5g3wAPAGjWWPOOfeCsQRe3wgsOwEG3rgfOIzwH7nXJP3fsJ7f67yRedcARj03o/NsZ3XI47R9yO27ruA36p4/S3AXy9gfB8BNgL7EXPOM8C/894PLWBbgcC8CSaXwHmFc+6/AWe99x9dpu1vREIjn+u9zy7HPgKB5SLc0AOBQGCdEGzogUAgsE5Y1A3dOfcK59wB59wzzrkPLNWgAoFAIDB/Fmxycc5FgYNIUsdxxMP/Zu/9E0s3vEAgEAg0ymKiXJ4PPOO9PwTgnPs88HPAjDf07q6o770ovohdBgIL57HhbgBcVETMc1rOzfb2wApx8NH6eVeX/lRmhUeydnno0cl+733PXO9bzA19G5LabBwHXlD7Jq14926AHdti3H/HRYvYZSCwcHbd/i4Akm2TANx/UyiEuBZ4+dZr6j5/xx37V3gka5folmcayolY9jh07/3HgY8DXHd1KoTUBFacb2dkVdizZRiAiSl5/CvHbwDgQ5u/w45Yy+oM7gLnockpLn4gBcA9t+0D4KY3P1x6DeDaZGJ1Bnceshin6AmkZoWxXZ8LBAKBwCqwGIX+AHCJc24XciN/E/CLSzKqdULOF+gvVPViIO4cAN3R5tUY0gVJwhWqHo+PiiLsy4oq7ysk2BFypleFa5MJ6P4+AO98z/envxaYFwu+jL33eefcbwB3ICVIP+W9f3zJRhYIBAKBebEoXaJNBeo1FrigOVsYB+B0IcoPMpcD8OT4VgB+ved7AHRHV2dsFyLpiDhBUzEtyqienP4JUehZH+T5ahKU+NIRruQlYLgoZpX+gizt90/Kzfuvjt5C35iYVra2jQBwpktuIpdTXOFRXrg0O7mRX9LeB8CJPulb0Tci5+JwrodL4kcB2BhMYYHzmJD6HwgEAuuENa/Qx4r1C961RFIrPJKZGVBl/tfnbgbgrhOXADAymiY/IVM8EM/X/3Bg2emJympoU1JWSclUDoBsRpb6nzh6E9dc8nkANgZTWOA8Jij0QCAQWCeseYXeVxBl+7Wx5wDwwvTTAOyOja966F/OizI/VhBb7OPDWwAYHJTHPluWe/mC/HZOeXsu2NBXirSTOX9R6wEAvpUQR/XEkKzyhjJNHM53ArArJio+HQmOusD5R1DogUAgsE5YswrdbOdH8m0A3N1/GQDf9aKuPrjjnwEJD5yPUrd04loWEjplSUMPT8jq4fC5LgD8lP5OVhQ68N7Ne/urzVLO1Wpi/padsUEAdndKUa4fjUhRqPGJBD/K9AKwLyG9nINCn04j10PBy8pzUCO/MlrNNaffhfaI079yTuIuOC2WkqDQA4FAYJ2wZhV6Vu3T3xsTRX50WGKHcwX5Rf/v7pX8yc6vAI0n6Tw0OcUn+18ETC8EZOnH81GffUWZvn8+fZWMWaMmKExX486dP3XJTInNNlfnm0oH2BqT8/J/bf4hAE/1bQIgM5bkhwO7ALi1TSr8bVmF8a1VGrkeAPbEC6WIr0emNgPwzUH5bkT1+r80fRqAvclTAKRcjp6orLQtGinkAiycoNADgUBgnbBmFLrZ3sa8pGkfyDUB8Oy41HQfyyQByKkKHkhPcKbQWNZlpcJ44sOiGLqQWGRTG7WFgWbDMkOfnhKFd2ZUo1qmZv59jEXXflRLrRJbirkyKguV2cpmtFit8jdHpaGB2VmLwLma1U6PNqdYSIRTe0Suqa1qS9/RKX+fGtnCkUGJcjm0TZpg7Iz1VX3mQqHSTv5QtheA/WM7gOnXw7/+vVwPr/51WdWcKxS4rf96AJ7VZiID4+KnmJqSZfT3ExcD0J6WayFXiLK9dQiAP9zxdSDkAiyGoNADgUBgnbBmFHpRQ0JO5uXvPw4+H4Anz20EID8ZszcCMD41/1Z299y2r6QujJIdcB5YzZZ/7LsOgAldPdSznRsWh34+YEp8KebKopXOFPLcmbkUgK+elg41I5MS6dCS0OJZUck5+MUtYuPO+RifPSGKLx0T5firW+8G4KqEqOvWiFwX88kc7tFiXV1JWRHEknkm9fr6p3Mytn1bvwVA+wqdNstpqGWlokAqV2d2/ntedRyA0c9uAyBNdbbzvn//GEApQuibJ6/g1Bnxdflx/b4m9Qur3w27osbRtnMRz9iEfH++s+FyffVJ2X+wqc+b8+cuEwgEAoFZWTMK3Rj1orxHc6K4Mln59fZ5/e3RX3rv3byzLm9688PTPPTvXEB0y+mCqIszmVaAUr2W9YLNzWLmqlKZA3xy4IXcdUoU+oDGf+ezcq69ntNoSlTq7/W9Vp73jnxOznEsLq99tPAyAG7d9AgAr289CECTL18DUTe7TumKyjbfvFFWAk/0b2JoSFTg4JTYzPsKcowr1fhi0ud0vzJfPVHZ8XIr9Hp+E1uZjZ4TZd76VmlE1nf7dqB8Pbxpg8zf98f2AnDyyAZ6JfCMM9fJ+Cc3yHkppuuvQCi4UnTYx5+8CYDvdlfmmkDaybW0luo3rVXWzJ2ovOSUL/dEQb/sFu1nf9UpFosWp3WimYnSDaj7+4vqimLO0L6COGqHJ/QCm8XUYkxMJvSzkijVX5ByratdvqASm4u+zh8BcP27n6l6fWtUvvyZit/P2gQcc24PFOXGdNuwmKX2D20v3chz4/oZO6c6f4UxuRwnxiouS53afEyee6YgzrbvJ+THoSc2CsAlibMA7Ix52t3sjkxzdG6OSpr/7s5zpSQj+5Fe7hrpVjO/T01xVnqgTxPprkqKuWNnbP7Jcwuhnpntut+Q6+CaFrlWr33PF6s+Y3O0f1hu9Hv+Pk+8X8xYW3Jamvg1c5tG/YT8aE1MyHk5oMEGfx79WQDet/2bgJT7gLX1nVlrBJNLIBAIrBPWjEIf0yXnwxOivA4NbwBgKlOtAF1CFGBrcrLUiQYac5AuNhnGnKFfOPs8AEbHGg9pm5yQMf7d8RcCcM3FosDWUueiflWNaf2Z/8LplwIQczLnG7aNAdAayXJ1YmL6Biowk8WDgzsBONS3oezYthVNI7lW9p6cOtX0enj4hKjCp9Rp/sIthwH45e4fcFm8sSV6j6442uNZouqAy+ZkjOPFpO24gUHOH1Pmf3ziVQAcGRWFbgk421sklO//2SZmh+W+TirNbDOb2Kq/P9/SYIBnb5Pv7KbMaGlJfep6NZVGZF5dk3x3XERe90U1nWaj064DM2E+ekIaxfxB4VYAfn/n1wBIuWwwv8xAUOiBQCCwTlgzCt0Uy92DUoTL7K3mMDNbaiwhttnL28+UWos1qtAXQ8EXS85QU1PTnKFq36c4XYFaKd3RSVEufQWzAy5vwpGp7lrq2SHP6Dn406OvBODgafEVRPW4PumkgcebNt/P3nj97U54Ub19hXbZ/4TsZyqTmF60zObLXBCNKHdV6pODotAm4zKf382KSsz5CP9p410A7IzNrtS7NOTxVV2Pcu/xXgCKxZXROHb+7Vo63S/zFVEFG1Vl22jy3EKp9C9BddLYTCtau6aeyO4GYGSPjK3rQIJTL1cfxCZR5JEuuR66O8XXcXW3OFh/1CcrrL7T7ZCv8UHpY/O1HB6QonefbBbH7a/3fG/Oc3uhEhR6IBAIrBPWjEIfV4/5uawol/yUDs1EnNrON3aIHXdzcriUeLBS5HSMhWJ9pRlJiiopWmOLyugXfe9UXl5brkYX/aXoCdn3gZxEDHx7SEr8vr37B4DYIZucKCAL8zPVeHZcVGFuTF4vaDhhViOPssU4yRlCAzMarXTv+B55rAlgvujK82WLrrgcezJdbaeOa7u+XC5W8j1MK6tgqyBT7GOi1B88cxF/FxM/xXv0WFtmkC2m7jZEx2hKipK0aKShoqzG+gsngaWJrLBIrsFilhN5SacftbDcQrncAcBUYWWdK/MqSqdj/f45abVYbJPz9ZNb49gR1Crz6zcdBuCK9Mmqbd2bizE+Judh2vdGr5cJ9Zvcd1rmrCdxHb/aeR8w87m9UAnTEQgEAuuENaPQp1GstquZ7by7SRToS1qeKLUWWwnG/CTnCpLWnKkpO5BslWgbs39a7E1xvDy9TlVuiyrBbdrqDJa2+NNwUcbw0bM/A5RtlamYJvh4tUNu/B47YzLSyaKoqpJqzFTH11tSV0QjMC5OnCXOPOfeU7aVx2Q7psy3dg4DcFPPswA8N30EkEiTTx27UcakvgeLQhkbUh/LpGoStbsODrawPy3HfLhDVhpdkbntrSldFYyMynb//pSUHLhqniWaZ8MU+v7JDj538gVyPBOiPn1OjsOrW2hwTMZxQuPTzxaOrpkU+IGizOPwlPx1ej59U6F0bpvScm3tbpdmInubpFzuzU2S23Bjk5zrV3b28Hcn5Rwf7BOfTVZXW3ZuTbmPJ+T54xOdZDqW5dDOe4JCDwQCgXXCmlHoUTWYtWoMMdYQwmlauNqpNyRFoXdEJmlZgdKm5VZ4js+drlZVpjjTKVHdz9ssGXXfPyL24+z49Ol93bZHlmWcZjvfPymxu6bM+/sk8zCRljGm4xIx8KnoTfxm990AjKpv4BNHJfXa7NaG2XeHsuX5nqlFm62abmgWJfbN6BW6kfJ7LBa5vVli2d9x0b8C8NImUeZWcCvjC1yzR+L1H8peBMDt535KjnNK0tInCxovni+vJgYyMs47R68E4JI57K2bomO0J7VUgZdok9PjEq3Rp7b0y+t/dF4MF+UcHJraxelR2X7O5rom29jS4e2cXHPJ56eVlbXM5Ulf7YfpXKb2bvZdyHmZk5IvKVI+ubGULDGu2iiK/N2b/wWAy+KyIq0tprY91sdujS//dIv4Pm4/JOctm68ueldQ/1PpHhGYRlDogUAgsE5YMwo97mqiPWqaKqdTYpOzX+fWSGN1XBaLtcL7+si19E+ITdYyHp2uGja2SOTNRSkp6ZpKiG04S4XNVg/nn09LtMnL9zy+pOO0yINPnhBFNzCkTTf0ecu4PaKNrLtT4xzISTbuOY11Pj0kat5qa5TQbZjvQKJ96kfnmPJKuZoMywqfSCwpKm5ri9jO9yWPAbAxmq76SJODTlV/aXcYgFNtYlMemhQVfqgox5AbUTVXhNFxGcOBMYnw6WuX8zVTW7meqOed2+4B4MODUhhsRP0Ix3SOzsaP6hgXbsfOqJC9b2Q3WfPDzBDkZHkLIxoFcyLfxpWJ6jm1dm9fG5Nr6pqUjPE6LQu81Aq9oMushyd6ZftR/Q7auY16Wprl+/mWTfcC8FxdUbc4Obe1hdPaI03s0dLIP9cpGaqPb5AzdWB0i+1YdqP7mSrGyPo1lGK9hpjzhu6cuwj4DLAJWTh/3Hv/F865LuAfgF7gMPBG7/3gQgdiJyiTVyeRnjy7abbpDf31ndKV3ZJClhtzMuZ8lL6RlqqxRTVM0ep4x2cpFraciUVjxSyH8nKjPjMmYyxMVIeA2Y09F5d5G5hM8+yUpM3fdlRKGdSaWgyXrHboSg/IeZq7KkwuVjnxTZvv1+1pMk29UEg1J1j1wVtbxWSV0+vlxLCYSHJmAiq60g+uFdqq7YxUS3e0mc0xSbXf3iF/D52RImC1pRo2aMLPXBUd6xEp/fUlB/pc+AphU1vQ6wFN7PnhkPxNdcoNf3f8ALD0IX1m2rGSG0NanK70XY0V2dYuP9JmQk3XhMbWw8x3GyJiQmqJT5a2B+DV1KLWVw6NbgDxn5aKwS3kfKxHGpmFPPA73vsrgOuBX3fOXQF8ALjTe38JcKc+DgQCgcAqMafM9d6fAk7p/0edc08C24CfA27Rt30auBt4/0IHkrU66FPlpbOMULuFt0t51J6ILCdbItXL8+Viystv3oGxTaXa3KY24xpKuT0tqq4zVj8dvvIzhZpwzKUg4wvcOyZJHlP52U9pUcPjRiZTfKdPHJa2rJ9malESTaL8fqVX0sLnk9BVOl5HaQ7MJNWh57JzFue2Ka8WJ2rwIk353qhOtoSGY1aWW7AQQEvOyZVK4c487q3az9S6GB3S5/vHZCX1dE4k4a7YAFBWnvPB6rC/dsN+7j+1o3rctdQR8KbMP3xMzEIHz8mYLOQSpGzGzelnpn12KTCHdyoi5y9u10GpxLUjr45SK26W8fLdmKukMUBKV7gW+JDU6y6rSWVWSnt36znOFW17a79X70oyr3WKc64XeC5wH7BJb/YApxGTTL3PvNs596Bz7sG+cytj9w4EAoELkYYN0c65FuBLwHu99yPOlZWF99475+oaBb33Hwc+DnDd1akZDYdFVcKlUKga5ZIvijoosvQKtx7ljjtikz6XbaYwVa1gW9Pynld2il03O4etFso20UZS/ys7sFdSm6Y9XvSlhiD5/AzOIp35kpO0EKW3RZI+LKFjJiwss0cbQszHMRivo+bNIXhIbfh7E2Lz7WqgP2hENUhvvA+AqNmirdBXRaGnwjwKbZl6fl2POOb2n5KwyHEtEfvg+C4A9iVPA7BjhrDN2bDGGh2RDE26Shmbx+fN73J0RBzDowPyeDyhhbD0K/hUj2irnuipJU1GSjo5b73xfgCa4uqktcVR0ZV8Gne2XlH13kZLGgNsTsp1VloRK9GYHOfQVBMdEQtdXFxJ7PVGQ1e8cy6O3Mw/573/sj59xjm3RV/fApxdniEGAoFAoBEaiXJxwCeBJ733f1bx0teBtwH/r/792mIGEqkNWzTlpUr90KiEj2U3rUy40qi2UPvemKSU9I81l9StRd60p0QlWCuzc4gammGxApTVqRV/Gi6K4muvsCPX9nms7e1JI709Xc3fmnZvYxPJUt/W4kx23LhGKpjNW+3MjZQrjmtp45hGhRDxpX1PTcrn/+GEtKfr3yTRKD/fJi3PNkXFhppy0VKonGGRFjktPdCc0HK9lW+zKDrd97Rrqw6V6hkgqXbprJZwfXBAbN5v7HgAgB1zbnFmpny0YV+KFQv7cfai0nGMTVT7mSw1/pyGqv7NcSlzvHf3F6clIy0GC4O0KJekRnc5XYX5XITMuIztnhO7qj77nh4pabxHI5wiepKiLlKKVLFIt8eGJTmukKvRm7q6zRZi5HRFH6JbqmnE5HIj8FbgMefcfn3uvyI38i84594JHAHeuDxDDAQCgUAjNBLlcg/MaLj+6aUaSFTlRskuZ/uvSYkurJANfUjtr8+Oi315fCJRUf5V24Q1D9X9rPczjzGnZYG/3HctAM/b/nUA2lVoPDQ5VdWBHcqNe02p1za6bo04NiakTKnZ9QsWd24t/HLVJUkns3F+cFjilwsz2N0japu9YeNPZNulZKFGFLp8tk1T6k9EKCWIFFRRWiLTgwlpU9evjYVf0f4YAPuSAxzWvIT9WXnPtrhEmXx94LkAnBlunbZvW0Gl9VpKzWPc1njcVllWmCyTs7K6S9NQoXSNzBGObrkBf/PkjaUxTWYtKal6G/mMXFtWokFs7ksfBWLRKGlNCIpoJFohD0Udw5hGoTzZLPb8x9olSWhrTFakLS5JLeYjOD4qlbcsWsmO0/IX0rEcXRHzLwUbeiVhvRIIBALrhDWU+i+/viWbq4Uvq9rq1PjgaEOdhReO2fNGNS4+k9fokWxZ3aWaRR3kNPLG1MK5wtw29EYwJd5V06C4ZEOvIeWivKb1UQD6t4jKvbMgLdmGdNy+ZmVTnIyWVj9+sr5Ct6iCWsaKczfptfaAJRWXKFDMVTeJnlT79BOnRMUNdIpfoTsusR/jxSTPTMprT41vBuDpISlre25U5jo7avbkcrx7RBtnbGrSVcs8FLrZqS0r1lKfx9SWbdFYKxX/bBnG2Wx0ml9p2ldBn89pvPq4T5Apin9npmJqCyGux76jWWbnmWbJqh3NNpejqVRdWxz/A2OyGmxW+3tvTD4bqTiIxycljr62fLNdupZd+/PdD5eikgLVBIUeCAQC64Q1o9CNqKtR6FrPIR2z7MKVsZ1lNNPN6oH4ivhmi4C4qeNpAJoj8lqt/bUell36+p6HgHL8s8W9jxbT/NQvPAHAo1+UWF5T5u+cIbqlJZJilyrhV7dLTPx9fb0A1LfyC742imDaG+S4TkyITfPro1cD0lzkoqio6O6o2EprC0F1afG0K9sk9+xA80bGbCVgpW41AzCvynlgXBT6N0/KcX+j+JyS2pzMafs7LV1cqNfmD2lV2NoiNUFu6JAmChuija+YLL65JVFdT8SiTazhRH/hqB7//OO8t8VGSiuAoSaZp5mydKva9s2kzGvIaYbsjzK9vCAp11l6Ed8Xuzb7CnLtHsqLIh/OzZL9qedlfETU9j89IwXE7m2T6JftrXJlXtZyhsdHxL5utvOZagqldc42RMeqosICZYJCDwQCgXXCmlHoVj3P7NKmQsxu1puWrMb4Mge5jHlRZtZuzlqBUXCluOxWrfx4ReoEUG4o0KPx0x1NomgGm1qmKS9T9xbvnFOb/TNqq9+f3cEv9EhFyd96z7erPjtb3LnZSC/S1nY7WsVGaao3o0qp1Gw576bHqNdQUHV8ZkJWKffmxQ762Og2rms/DMBrW34MSKQNlLNI23U8L2l5EoB/a9vNIY1RLvpqdW1jGh9sqvoLZYVsWNRJPWUOkGqZZN8mqYx4ZVLOz3xaFXZE5Pzs65SSvk+fkmzWKa3geNup58vrvfL6QlrT9USLpbo4fzj8KgAmZ1LoxvK6jupiTVMOaaTR3/XfAsCTQ+LXGNJGIhktV1w1RrOl60pqUv+eUJ/H2Wbx9TxULEf05yeqfSy1LQsT0VA6ZC7WzA3dyNV0O3elcqXyd2631gL3q3XPj2udo9vOyBd3ssIZGteuP5e0S9q5mR3irqVqW6/eLDe5/3nyljn3a3XM//+TLwegK5Hh5vRBYH6d2I12vbH+2mZJ5PhcXMq/3vUTKd41OVXhzJzjJlFUE8nh05LU1dEuX/B4tEi2IJfO/UOyhP7QttsBSoks9gOz035gmgc5lZYwxUxef1xqurtXpu2XhliYwxSh5pSkOqq3dQ7TFBXznDk4a5OTZsPKD+xJnZHtpmRb4/3ywzhoIYGL6GS0MdpcKqNgZRUmWYJwSBU/FlhweeokqQXURDcTi93I/+DIrfK4X5P71Jk97ce1kWnWc1yqXz/b5/RGbuGzyZiVqc5zPhsXGi3psRDO31kJBAKBQBVrRqEX9LelWJOUU1SFdnpS1J0l/MzUfWah9BfEkXbHmPSsfPy07MHSqol6mpvkl/X13WISsaYLtfzTKUkIqqc8JjTt/cfaI/PpiITjPTMkjqb2VJbshoWfFnPS5bysHrYmxfnUaEOFKlRNFTXNfvCcmF4S6SkGhmU/27pl+8fyYqK6NJ6p2oSZYt7Q/QBHx8Wh+OzY9KSSGWlw2JawNTqZZGBK1PMTWSmwtTd+sOHdWQGqzTFp1BCxMFodR1ZDL1c6fHFWahY2CQ03TUWmSsczH6zsxWf6XwzAs31ybWaHa0JEF2MGqvdZOw69ViPaWGVzt5yLPW2yMt4czQAttZ9e8zRS0mOxKj0o9EAgEFgnrBmFXmpcoeFi0ZT8OhfUjntwWJxToz3LY0XvK8pUfLdvLwCTmepu7C5VYEub2D2tGFdtcs2JvKwihq01V3a6/XJKbfIff+JGoKycTWHGo4WGGjLMhNk/Txbkl/7ZjJQuyJu905R6YR7eZQsz1IdTFWFlljhyOi9lU380JSuDKzRN25pXdEXHaE/IKsjpGPwcTtmG0OOwxh1j2SRnErKSOJoWm29f2ppkiEKaLcnGwi8jOvfxGkdcUVeIs5U/nslGulNDb/sKjqenJInGwiEXhCXcpEVRm72/t13KI/RER1mIZrPvQv+UnNva78JM45DB1CQ/GbOdY7WVO00IS7fIPaCrWe4JL9ksK6w3tsvKuPs8SyqqVeaNlvRYCEGhBwKBwDphzSh0S/5467YfAvDYzQbbAAAgAElEQVTHA68EYGJchmht0izhZ6lsl8NFUY2HtMXY6VFNJLKkGxUasUSBy9uksNBiCgNZGOO0MDXdz3CqibMFGUN/4SQwd/JKwRcpqgSy5I+vDN8AwNND2rxiloJhDVNqkhGpKvIF8LkTLwDgvTu/Iy/ExbZuYaADhQ5Oj7fp5xu0wdZTfkaNHdcSvzIjKU5quOWjMbWhN4m/YnvLsTl2WMaSxKI1+zUfSF9BjuVs4WgpVHMuG+mNbZKI9v3hS3nwjIyptonDQkgk5Zy/+/J7AHhZs4SK9kSLRObbzJtyU+2TY7LqKkWzGPpddRaFoso61TRVCnWNqR1/fLhJt6Hnq1blu7Iy7+iQKKqtuhL+je0SqXVFQkKWu3VltZRlDFaS+Zb0WAhBoQcCgcA6Yc0odFM5Fp9rjYQnfLXCaKR123wYKIiS+No5Kcc6lqnx5KtC62jLcFWzJKzUFgayBIy+giRJLMYums9H+MKZ5wFw/c6vNvSZIp6jeVlp7J+U5gA/1iYBlvyRt/Z5i2lSbeqqQEml5dSePqGp+c1qp7boiuM6rq8PPrfcjLpW8Rk1iSSRRKGkPqPaRMF8AeaLKCVKFS1JyZEtyH4OObGhP9YmavjalCj0eGxuW7oliVkTk35N0bf9fuLoTQBcc8nnOZZvzEZ65bsk0enIWBcjoxKJk8/KV9Cpv6W2F0cxPXMyjVM/09ZOiQIxZX5lYuFp8f2FcU7k5drp1wJopfOu5zymNvtNXdqSMC1F0DY3jXJ9qzSo/vaApPofTMgKsV8jpEp+pYqFj10PVjLjjZulici+pPgCNkbPv4iWepgSr125zVTSYyEEhR4IBALrhDWj0OditqYRC8GiQY5oZMrxcYmjzk9WT4nZCVuTk/Ro1mOcaoVu2Z6fPSGlXS1iZV5U2Go3p0TxjM/Q5LhcyEuU0kAxyv5JaQDxd8clM/TUsByXFbPyattcshTyQv0QFcvOtGiRAbXHns22lMYybQw1yjzVKnb3rZ3D3NAtzTUsU/jAmKSdH9NCTgNWRtcyD/OuFJVjGY0/OH0xABenpO3t9rZn5jy8Hl0RvH37vwHwh+c0RX9QIphstdFXaKZVC3rV2kijWS0ze6scw9MTMvbRqWTpenZjMk+ps5pf8UM59lPXy/YnJQScYrpQUuSJJtn+lk65Ht/f+00ANtVpyD1fzhQifOq4rD6ymWrFaOUVSsXPtPHJazukdWBvrNzy+pqUrGb/PPIyAB7QVev4lK4eKv0oOuyUlsVoVh9VywJi6NciJeWtSrw2mmUplLkRFHogEAisE9acQrfoghJWp2eRTSNqqdcEGiqUrI1H67fsaBlkb7wfgLgTlXFWbeePTYnN8cSwRAUU5iq0VA89znRqiotSFkdc/5gzWnfmjowUy3o8s42fjIu92JR5RosglY5nPuItuoBY4jnI5BPliI6a7Zrya24X5ff8LVKa9m0b72Gb1suxszLaLpfs/ux2AD57QqJ5jqOlVzPxUoSSNe4YHpfzdXyqCyj7TdpnkTPm09mmjRhKNVecKHTzk5wttJLVZii1NlJT5jFdXTw2KNfJyESKwoh8pqlPxtj7lf6q/W+RYC8Ov6asUk2Zv/uKfwXg5S2PA+XVxEJK+dbSV2hmYELs+2bvdpqx2dwqK5HeDrk+n9dyCICrNb+gPVK2dXfpKvLXNkmkyvHx18vxqA8iP6bHVXFNjeqqx9pMZrwc72JK/64lllKJz8Sau6EbTeoUHVRTREZP9pAWReovnFzQBWwdiQa0quPTY5KwNGYV4/LVDiD7Ir+m6xGmNN37YE6e2z8pTtA/OyitVcdHazqtNIAto60L0t4NZ3llq/TUTM+w5OzTm/TjGQnLe3jgIkYnZX5KJpaafozlHerfWVKvzczkNPTMnFZV4YYVHYIA8kUr3VCdEl/ZA7bkDK3Zd1SLL928XW4Q5iS6LF6kJVLtELMiapujRwDYuFPMU59ISKf7g+d6GB0wZ55+Rk1gh8bFfpHpaPwH15yjm1tlP0NDmmyjN6ZPHb+JP9/9j1Xjfvt7fgDAAxPyg3tnvySrndIeqlOTcZz2eN3yb5N192sml8pf4qakfCeeo87dxTg/azFxciK/o9wxSLEfkhdvl/ryr+mUXvEWThivc8O1pDsrYGfhrH+SE9PVyZz2Dc1GS9eSXbtf7ZMfxGu3ixN5IRUtL1SCySUQCATWCWtOoZtTraTr9D+msj5/WsraXrvzK3SWSuvO/3dpSBOU+rKiAAtT08OpoKw8D2S3MK4OPlvmHzsnKsNS4eul+s+EKfPOLlEw77/sDgCuSpwqmVrSM/TtHNfSAPsHxexwrK+zFJY4U73w0oRWmnFqy9WqMjfFnGqqTmE3BZWfiJWcuFaioTUpStMULZrQMmsPWB1LU1o+u6dJnJZ7tGxAS52uNOZsNZPI1aoS97WLah2cTDM2osksauKxsgB2rsd945e9ORrfsvU+AP6oXxRmVp2jQ9kmzhRku7c0yXszRZm3k3kx14zm5L1TmpRUmIji1ClqStxMLGVnqGyrMmzRkpqszMLZwpGquVgMtur7xNGbSh2D7Brd2C7X6Js33AvA1Qk5viYnq+XZvn/W0crCWV+0SRzSXxyUMOFcNlr6zllAwqmMrGROF2T7F68P3+iKEBR6IBAIrBPWnEI3SkWRLJdFHVz9E6KGns11skXVYNo17myY8KIU+graqXzSkl3q270nNNHosweej1clUUpqWYjzUzG75G9feicAtzRJmv9sassSmI7lpLTvmVGZi1wmPl1t1yhyS68uUXRlm7gS1zF1aSML6+cZV5VqjumMdxR1f6auf/WifwHKDjrDbOi1ZZEBIqoAd3eKk+3FzU/J/mh8Xlu1IcW/b5femedyzRzrl5VTbsJWLepks76kvnHJ113jHG3SvpZZbUhR77iMgvoTJvNacjdXXj15PXYLSyw7P6crc8NWgp84Iv6Cay+TVcnGRdiY7Zp6XB37Z4ZbS9d1slMcm9Zd6ZK4OD+bNChgPivj5yTEB3FfTLYR01VYZRK8XY+WpDZetHIf+j0/T1P+V5Kg0AOBQGCdsOYUupXR7UzK3xOqGi3FPKtq53CuhxekRua9/QENV7x7RCIPBka0Z+gMKfHW5zA/Fl/Svo4WsWClDhqxgw4XZQDWHm/KEpjqRdWoMo9r9ExKS6ta+deJTKKcNm8NBVRdd6RERVl/12RU5mxri2yjv6mF0wNi59zWLmnnm2NDdY/DesVOFmLTQiedDvvqDklC6YhYSn7jqd4WTRFBPvvS9if4dlLObQ6NFDHlNyXX0GKaU9RGXxXqXDdm59+gER4FVfH1VoGzpfbXYj6aMQ2ZtHLNVyZyM35mLuyaMt9UZbGw2mt0Q2T+ytzmwiK29iSltZ8V76rCVsB5+cyDGpZ7TVLCQddL+OJyEhR6IBAIrBPWnkJXZflLW8Sj/gdDrwZg0BS6tgAbK6SY9PNTWGPFLE9NSRu0RwYkhjuXmSEdPV8dZz2rOp/JjDrLZxaSKJVVtZtX5VxVYqCmfZfZzBMasdKaElu3NeGemopSjFZH9hRVbUZ0bK/d8igAh7Ni6I2qot3RPMgjEZm/bWlR6K2lksLVkTnWrMPsopWYT8JyATQ3aEGYfTVOgYQ2E56JhRR4S0dk/kqd51UKTebipdyIsaIkB9U2PinRSGG0pWj6MQ8skmRYU/JzmfJ5spVFbh4+jZlIOrkOWiNqQ4/UmXtLc9D9DubTi97vhUZQ6IFAILBOaFihO+eiwIPACe/9a5xzu4DPAxuAh4C3eu/r996aB0fyGmubk1hbU5TWtsxioY9OdjGu9r9GBcSZQp4vDVwHwPE+UerTokNqqVRKtZEjMVEZMS3xmtS/VuQ/m0ksKhJmJq5oOwXAo0VRyUR8RXanxuarjbJFlXl7UiIWOjRN+8n8JoZsdVLKAJW/qai2MkuI4kw5eXyVZigWiPCq9keAcqOHnsjsqrjo3TSFWtTY+RGN0z6nUQ0Xz7ql+ljBsoxvKV0zJWrKR0wrL9EAKf3MtmZZkRyJSqmFyWy85I+5OnE3AC0qkywmPx2X+bNWccXxiq9dXH0dTdV2cLNl17t+lqJQXbnAm3zPLNqr8nqP6nzl5hG3PxNmd9+mBe7am2T/Q03N5WPUw4rrtfvTbU8AkHYhVbRR5qPQfxN4suLxR4A/997vAQaBdy7lwAKBQCAwPxr66XXObQdeDfwx8NvOOQe8FPhFfcungd8D/tdCB1Lbvuu+v5FMsvErVTKoAs2rUj+e6WC0QeVgsbZP5br5txO7ZDsZ/exMtsoamzSurH6t7ooppda0qI137JJSqxbn/LEnbiY7g0LPZEUdV7Yyg9mjXSym++FBadhgqxZinpQ21rVGEJ1pUeKXtPcB8IZuaRpw58gVABxJdjJU29VNI2BSGtVicdSvb5UmvfbrH3cRQLYfxzJDFxCBoPsfzIr91ubN6u3MJ5rCiq39YOTS0nHU+jaatDxrpLaLRAPErWl0pDo/Ip+N8XC/nI++LlG5u/QzFpP/Hy6SOO7fG3wtABOZ8nUb1cJXN+6SOjbbUhIt9MWD1wB1WhUuEVbg7d7xPUDZN1WJZUnbdWA5HC1uBh9BA9icvGuHtMv7w8FXlY7R6Ws72gb1vRK73jSPPJMLnUa/MR8F3kfZi7QBGPLe2zr7OLCt3gedc+92zj3onHuw79z8l7qBQCAQaIw5Ja5z7jXAWe/9Q865W+a7A+/9x4GPA1x3daquHn5ocmpa+67OCVGczWdliMdeoW/WbLuzmdZS1MdcWKztZ8+8sNxkYa6KiDUZlolUjg1tokZbEzI2U7IbkvL8i9PSBLio8u1rHVdzJFO/zksp66+ilRlQamdWiZXdNJXYkRR1vKFT4pwT0QJtaiNvS8jfzRqj/44NooRKNm61S9514pLy6sQaLVu0i47fIhKs0YBFKlQyl4q2iJJ68dq1q6PRomZfqupv5Oyamu8rytgGcs2lSogldCVjijNTykAUBdhIBqLNyUShZtt5x7jGt9c2MLfV1uaY2N27W+W4jo8lS1mRba2Sb7G3WRqQ70pKPZvbU7KSmmS6Gl6KUtLmf8oUtDpnHbt8XuPBfzgmXo2bm8R3s5iGcLWtJpuSudIxOmtxF7EmKQuv1XSh0ojN4kbgVufcq5CYtDbgL4AO51xMVfp24MRiBlLb7aWQlItp8G1y03I13cNHs0lO5sWxORwXs0J7nWJOUA7N6ptoKfVwnBbmV+NUbNI66O1quuhpGufiFtnPazukfKiFsm3Qv+0RTXPXO9W7dtzDHw5qp5uaG7rd4Icn5GL+xuhPAXAku2Faz0HrdLJTx/jbW6WQ1/fa5Etf8BH2pQ/LOHWZ2qVhhNb/NKqn2sw29b7AdpM5l5UvXUdUbjaWHLKQL5Y5IKORmW9CttwfKsh+h4tScKsz0tTwPq1T/dGxznKhNUMP1dL0n5qU0gkv0s5QjdCqppbmqMxrKXU9G5312KCcLNfdJNfyWFei9ONy6Qa5pqxk8rG8xG6Wbtp1whitFruZ6/rVXDefctLNeq12xmRsljA1VnFZWFjsSF6d1np9LKTUgP3wnitO6Nit/275BzKeENFh9eNb3eyO9sB05vy2eO8/6L3f7r3vBd4E3OW9fwvwPeAX9G1vA762bKMMBAKBwJwsJh7p/cDnnXN/BPwI+ORiBlLb7eWGN0mfwpdp954vHbkagIFTEmY1nknytXPiOL16i/RUnK0DDYhCi6W0i7yGRlkDi4Q+3twsqm17WpxTVsy/KzpGT8RKxMq0lRNI6hd76omOlDvd1Fk6AwwNyAL2S1FxgqX/smNat3jrQWgKLOXErJJrPgBIGGG5a4ytUqrNCNbA4O4R6dA0LbQPSmGLpmSHCpbYka079vkQjxbKDuYac9ekJh1985yY2zo03G9vvJ/uqKWOy/FEVLKa0jutx3FfRlYrfWPN01LsI8nqEr/7mg7Pe/xdWgTsVV2SbHXv8V5AikvlLUzVm+ml2pRTmyz3GW7gZ7vF9HVzWsrJWpnegUKNya3OnBX0/186I9fHzb1H5n08Fgr4wmYxE96ZvAyA/qgvzZ8VEzs0KollZzbItXpRUVW9npPZVlGmzM2h+nROrs+PHRYT61SFeSyvK7UbOqSRRtYHU8t8mdcN3Xt/N3C3/v8Q8PylH1IgEAgEFsKaSP2/NpmYsSP2uBcV8M/xK+UJVSy5iTiHR6VH5MNdUvoz5aQEbbsqI1NIXdqVfUt6BHpkM9dtELtjUp2FpkrbtbznL7TJCmFTlRpvrOyqqeG+wo5SeOJcxD6zQf9XjgQq2dBrSKqT0hoNAETmcCGOFi1JRMsH5OskrBSrFXpRLXLW9i2+gASPTVqgqjUxSbRJtlMYq77szIn5yBk5j0NTLwGgt3mA5zRL4a7dCXEWmk3+cE6CAz99TJqNnNb2bpMT8VICk/UqbWuVc/qSjRJ+ad3pF1IEzAputTSJ2p+gqWQPN9v8Tanhqs/ayuqyuBzDCzoPlxzol8arV26WjNTTLH+Hm+SzhYpkJK/729Usvobx4vyVrB1PV7S6GN6ReJGCFm0zpW59aj/ddyMA8Y1SKvk5GhzQ7mZuhWf+pL6CfM/uGZdV2MC49i2tCMs0p/yXT8rK+5Y9B+d9XBc6YU0TCAQC64Q1odBh5o7Y/5pVj7eqgdJPUBFOaCODv42LcihslcSel6dPV23DbJi/s/UOfjghIVgXq+IzRjRk7vqUBOuYzXTGQkuzUNnOa6o2hK4GUyhDbxDll7yrjRt+WZS5NR2unRtTyvNRzDZt0QYKUllxrogl0ywi9drm/j9s/wG/PyKF1kbHVRmrkrYkr3G12R7MSrGuk81tPBiTpB1bNVj4o0V6mLovViaKlZpdy/gv65Zz/ZIWsVt3L6JRgiUllYp0US5HYWV5rURzS41csrl4p0ZJwXT7s9nS37Fdwk1/b0CTkcbLbf+SWgq5JSbfiY56ha4axFoEtsdlFRuL5ylE9HyrrX58SBT4fcWdAAxPvRyAt2yWtnx7E2dKY++sKbE7qCUGDuRkBXpoQuzx00JLA0tCUOiBQCCwTlgzCn0mNquNrzslNsXDmuI+OZoslfo8MSyRL493StPka5OisltUTVVGh+yJi13OSu9Wp7NDnGr7+2JIRvOl9PyZIpWTG7ScqEbZ/Mlvf7IUSz7TqmUhRFW1mvqum5yiT1kETCqiRaVmrA88N92l5JohNjTLuRyNaLy0RW7oX69nI69qfKSYLic7WRy/JSjNUrLBaVRLh7bS29Mssd67NeZ6PrbzOaloun3/cC8AP9/6aN23NhInbu/ZqNdAyVYfSZVa9nVobsSupBxXOrLwFVSXxtdvUbt/c9Nkuem5RUKpLT07JolTj01JUviH+2X1sLdHFPobNz1Ab7xftyzbPZnfBMBnTomv45kBUeiVuQJOfSvtbXJ+3t8rUWu17QwDcxMUeiAQCKwT1rxCt+zLX9r4rwD0Z38WgIMTm/Bqcx0eFo/5fed6Abg0JTb0SEric6fHjS8vZgd967Yf8scDrwTUBjoL77/82wDsSw401I5uvqS139uWhCixVCLHRG2RJRW/U6rMskVRahNe1eIiijLFXaHc1MBkRG1pH1PqqsILlfHkM5VqqCnRkEzn2NwhaeWv2Cw281tbpdSvNZReaqxccrs2UV7K+OlIRcao2c7f1Sv29ZempaBXfBGt2WxO3txxPwBDuTR356Rg1/ik5iFYaYhJXUHZ3zG5PvZPiJ/jXLaZppi2jNQmLFaW11o95sZrSm84SOuq+wN7vwXANcn67QwDcxMUeiAQCKwT1rxCN5vi3oSU1LyyXQoEnRltZeic2EKt2fGzpyXI/BMFKXh1cONmAN7eId742qiD5R7zttggbVpad2JI1W1NjPTGdolu2ZuQ41ouVWJNeq/QKJ6mxDUM1rzH7P2mpGub9C7W8tyqkRTWzCGnmZXTmoyYWXq2AmrmE0hV28tv3nqI13XIeC+LazOFyNL5RSxKqKScPURKxb9ElSYWUJ63FoumadLmGE3tWXZ0yRl7nq48d8QW7wuwVeuumOQ0/HL3Dzg0JhEpBzV6pxQDX+u3sCRWjdQ6fKynVALXso59jZ+kdhsuWWRTm6wAzf4elPnCWfM3dMPCCF/f+SAAZybbuD8jYVS5UbnwLPztyCm5IP9Nb0yvbpMwsV2sLBFXLKWbn9UbuJmJotrr80WbJPW7o9STc3lqP1ulRHO4tiYmS2YKX68MwBKTcgV6UvLjtbFTxjCUlBC3zIjcVGpT9nGUbgTW7ccKOLU1y4+Dze+eNnEQ3tL2FM/V6pftS+n8rCEera6LvtRs0EqXW9JalTCW470XfQdYHmeh/djtjI3zgZ23A/B/T/08ACe12WttQliJ0k0bfK7Ba0k/kmyeKp07K63RaAJfYDrB5BIIBALrhPNGodvS8LK4Fm5qOc3jaTGpDFv/RVW/ZoIxh0xtneqVoiOSZYOGW55IS2hlUU0u3Vpb/fImKVdgRaiWf0yyEtjXeYxDZ2Ulk5uq/l03p6iV5F2Kno4dkTw3tkm6+w1tUnzp/lEx6ZzMyNycHm8FyklEEedL/zcl/o6LxDm+LSbmh7iWWO0oFU4rEl8BhVcqblah0M2ktBQqycIJ375JHKBDxTRXqdlxY3T5Vh7d0Wau0Hr0f7BHCqi+L//vADjnZb/FORz8jWArrs0dI7yzRxLoLHghsHCCQg8EAoF1wnn3k5hStfiilqfY3yWJRMfjYuPLTFUrM1N1VvAIZi4itBz0RPK8tOspoBzSNqzlQ9++SZTmFQkpsLSY0LNGsFRsWwlsTIzQkpb5GbI0bHX0pWKinqyBx1KEe3ZHEry+RQptWWr8i5sk7K5PV1BWfrZYJ+yvVLQqWt0NqKxJlufcWq9b44GJS4EKG7ovNwsxp+hSrAO7o3I8psrTkZFlv0bK+5a5vSIhvo4/uuyrAHzksITgntJCaFPanGJGp2kFkYzOTZOWk1BfyId3/xO71SHbEgnO0MUSFHogEAisE847hW5qsTc2VmoSsGOLlREVpXf7gLRze9tG7ae5SinE7ZEEr2sRu/HNTRLN0qyRN5bo072M9tB6xLXM7t7kqVLbsVGNuGlSxW6t0qx1GqRZLJUhg7Xho9Ojj+qdr5VdXZkyt1631myk7Wclae3MQFvpvRFV6+PanzO+BJEvVm5hY7Q89yvdW9NWQZbo8z/2/AMAj0+J7+ozJ18IlEtvjI41UZis9re4cXmcHNAV4qOaAPafZJtZH59X67zA7ASFHggEAuuE806hG12RGL/Yeqjquaw2YnjxtlNVz6+WAkhHEqTV7rmQxrrLQTneeJBLOyT+12zmParM/+OWu+RxdBaj6DqlVpk/8WFpyGBtASdPS2lf/2qNma6Yor3N1WWbF8Na6nS/sVQwTB5vikpk1lW9XwLgXzKXAPA3z97I+ISskqcyuloZkePo/bKsonPdsuI4889SLqDnPaMsV+7FhcjauWoCgUAgsCjOW4VeL/JiZa3R5zc9Uc/7Nt8BQF+x2ka+V2P9L2TbptnMTZkbkV+SZhnFsxJZRdER00YamaIozfZFlLM9H7Drort0mOIn+tGmHRwb7wRgYEKvqe9KudypjfKZfJN8aKb2ioHFERR6IBAIrBPOW4UeWBzd0eaSwtrja+vYrkyZ4bWMKUhT6vZ4Z0pswV/ISSPjgVPtpeYkz01L0azochV4WaOYr+V9m77D6YIo88eyYiO/+9cuA+CJL+0FyvM4U3vFRqnNDzCWsinM+UhQ6IFAILBOCAo9sKgm0OuNksJTBfnO98jfrJevypD6Gz5fvLb0mVTc6slI3P5KNVJZK1Ta1HvVn3B98hgA+5oOAxB9z7eqPrMYJf3Q5NS0/ICSTX6Ryv98J9zQA4E61N4QTuUlpPMzw2JCmNC0d2Ke5oQs/9Oh/Ou0cMvnJ5fOCFAZUlobTmo3dvsBvlAJJpdAIBBYJwSFHgg0wIAW3vrBOem3OaUFzSKJArtbxVHarKV8L2SFvhLcc9u+aeGkIQxSCAo9EAgE1glBoQcCDXBlQoqDHR2ShCKvfTR91JOM5qveE1gezK/x6H/+K16+9RoA7ji5v+ZdF6Yz1AgKPRAIBNYJzvuVK8DknOsDxoH+FdtpY3QTxtQIa3FMsDbHFcbUGGFMjbHTe98z15tW9IYO4Jx70Ht/3YrudA7CmBpjLY4J1ua4wpgaI4xpaQkml0AgEFgnhBt6IBAIrBNW44b+8VXY51yEMTXGWhwTrM1xhTE1RhjTErLiNvRAIBAILA/B5BIIBALrhHBDDwQCgXXCit3QnXOvcM4dcM4945z7wErtt2YMFznnvuece8I597hz7jf1+S7n3Hecc0/r385VGFvUOfcj59w39PEu59x9Ol//4Jxb8RQ451yHc+6LzrmnnHNPOuduWO25cs79lp67HzvnbnPOpVZ6rpxzn3LOnXXO/bjiubrz4oT/oWN71Dm3b4XH9d/1/D3qnPuKc66j4rUP6rgOOOdevlJjqnjtd5xz3jnXrY9XZK5mGpNz7j06V4875/604vlln6clw3u/7P+AKPAssBvJzX0EuGIl9l0zji3APv1/K3AQuAL4U+AD+vwHgI+swth+G/h74Bv6+AvAm/T/fw382iqM6dPAu/T/CaBjNecK2Ab8BGiqmKO3r/RcAS8C9gE/rniu7rwArwK+CTjgeuC+FR7XzwIx/f9HKsZ1hX4Pk8Au/X5GV2JM+vxFwB3AEaB7Jedqhnl6CfBdIKmPN67kPC3Zsa3ITuAG4I6Kxx8EPrjqBw9fA14GHAC26HNbgAMrPI7twJ3AS4Fv6AXdX/FFrJq/FRpTu948Xc3zqzZXekM/BnQhdYi+Abx8NeYK6K25IdSdF+BjwJvrvW8lxlXz2s8Dn9P/V30H9eZ6w0qNCfgicDVwuA6NkIMAAAMOSURBVOKGvmJzVef8fQH4mTrvW7F5Wop/K2VysS+icVyfWzWcc73Ac4H7gE3e+1P60mlg0woP56PA+4CiPt4ADHnvrR7raszXLqAP+Fs1BX3COdfMKs6V9/4E8P8BR4FTwDDwEKs/VzDzvKyla/8diAKGVRyXc+7ngBPe+0dqXlrNuboUuFlNd//inHveGhjTvLkgnaLOuRbgS8B7vfcjla95+RlesVhO59xrgLPe+4dWap8NEkOWpf/Le/9cpAZPle9jFeaqE/g55MdmK9AMvGKl9t8oKz0vjeCc+xCQBz63yuNIA/8V+N3VHEcdYsjK73rgvwBfcM6dd92+V+qGfgKxmRnb9bkVxzkXR27mn/Pef1mfPuOc26KvbwHOruCQbgRudc4dBj6PmF3+Auhwzll549WYr+PAce/9ffr4i8gNfjXn6meAn3jv+7z3OeDLyPyt9lzBzPOy6te+c+7twGuAt+iPzWqO62LkB/kRvea3Aw875zav4phArvcve+F+ZLXcvcpjmjcrdUN/ALhEoxESwJuAr6/QvkvoL+4ngSe9939W8dLXgbfp/9+G2NZXBO/9B7332733vci83OW9fwvwPeAXVmNMOq7TwDHn3GX61E8DT7CKc4WYWq53zqX1XNqYVnWulJnm5evAL2kEx/XAcIVpZtlxzr0CMefd6r3P1Iz3Tc65pHNuF3AJcP9yj8d7/5j3fqP3vlev+eNIoMJpVneuvoo4RnHOXYoEAfSzSvO0YFbKWI94sA8iXuIPrYbDALgJWQo/CuzXf69CbNZ3Ak8jnu6uVRrfLZSjXHYjF84zwD+i3vcVHs81wIM6X18FOld7roDfB54Cfgx8Fok+WNG5Am5DbPg55Ib0zpnmBXFw/6Ve948B163wuJ5BbMB2vf91xfs/pOM6ALxypcZU8/phyk7RFZmrGeYpAfxvva4eBl66kvO0VP9C6n8gEAisEy5Ip2ggEAisR8INPRAIBNYJ4YYeCAQC64RwQw8EAoF1QrihBwKBwDoh3NADgUBgnRBu6IFAILBO+D+JHnJkn0oebAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plt.imshow(cv2.imread(directory+images[0], 0).astype(np.float32)/255.)\n", - "plt.title(labels[0])\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "charset = '0123456789+-*()'\n", - "num_classes = len(charset) + 2\n", - "encode_maps = {}\n", - "decode_maps = {}\n", - "for i, char in enumerate(charset, 1):\n", - " encode_maps[char] = i\n", - " decode_maps[i] = char\n", - "\n", - "SPACE_INDEX = 0\n", - "SPACE_TOKEN = ''\n", - "encode_maps[SPACE_TOKEN] = SPACE_INDEX\n", - "decode_maps[SPACE_INDEX] = SPACE_TOKEN" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[14, 5, 13, 9, 15, 11, 9]" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "[SPACE_INDEX if labels[0] == SPACE_TOKEN else encode_maps[c] for c in labels[0]]" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "image_height = 60\n", - "image_width = 180\n", - "image_channel = 1\n", - "max_stepsize = 64\n", - "num_hidden = 128\n", - "epoch = 500\n", - "batch_size = 10\n", - "initial_learning_rate = 1e-2" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " def __init__(self):\n", - " self.X = tf.placeholder(tf.float32, [None, image_height, image_width, image_channel])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " self.SEQ_LEN = tf.placeholder(tf.int32, [None])\n", - " filters = [64, 128, 128, max_stepsize]\n", - " strides = [1, 2]\n", - " x = self.conv2d(self.X, 'cnn-1', 3, 1, filters[0], strides[0])\n", - " x = self.batch_norm('bn1', x)\n", - " x = self.leaky_relu(x, 0.01)\n", - " x = self.max_pool(x, 2, strides[1])\n", - " x = self.conv2d(x, 'cnn-2', 3, filters[0], filters[1], strides[0])\n", - " x = self.batch_norm('bn2', x)\n", - " x = self.leaky_relu(x, 0.01)\n", - " x = self.max_pool(x, 2, strides[1])\n", - " x = self.conv2d(x, 'cnn-3', 3, filters[1], filters[2], strides[0])\n", - " x = self.batch_norm('bn3', x)\n", - " x = self.leaky_relu(x, 0.01)\n", - " x = self.max_pool(x, 2, strides[1])\n", - " x = self.conv2d(x, 'cnn-4', 3, filters[2], filters[3], strides[0])\n", - " x = self.batch_norm('bn4', x)\n", - " x = self.leaky_relu(x, 0.01)\n", - " x = self.max_pool(x, 2, strides[1])\n", - " x = tf.reshape(x, [batch_size, -1, filters[3]])\n", - " x = tf.transpose(x, [0, 2, 1])\n", - " x.set_shape([batch_size, filters[3], 48])\n", - " cell = tf.contrib.rnn.LSTMCell(num_hidden, state_is_tuple=True)\n", - " cell = tf.contrib.rnn.DropoutWrapper(cell=cell, output_keep_prob=0.8)\n", - " cell1 = tf.contrib.rnn.LSTMCell(num_hidden, state_is_tuple=True)\n", - " cell1 = tf.contrib.rnn.DropoutWrapper(cell=cell1, output_keep_prob=0.8)\n", - " stack = tf.contrib.rnn.MultiRNNCell([cell, cell1], state_is_tuple=True)\n", - " outputs, _ = tf.nn.dynamic_rnn(stack, x, self.SEQ_LEN, dtype=tf.float32)\n", - " outputs = tf.reshape(outputs, [-1, num_hidden])\n", - " W = tf.get_variable(name='W',\n", - " shape=[num_hidden, num_classes],\n", - " dtype=tf.float32,\n", - " initializer=tf.contrib.layers.xavier_initializer())\n", - " b = tf.get_variable(name='b',\n", - " shape=[num_classes],\n", - " dtype=tf.float32,\n", - " initializer=tf.constant_initializer())\n", - " self.logits = tf.matmul(outputs, W) + b\n", - " shape = tf.shape(x)\n", - " self.logits = tf.reshape(self.logits, [shape[0], -1, num_classes])\n", - " self.logits = tf.transpose(self.logits, (1, 0, 2))\n", - " self.global_step = tf.Variable(0, trainable=False)\n", - " self.loss = tf.nn.ctc_loss(labels=self.Y,\n", - " inputs=self.logits,\n", - " sequence_length=self.SEQ_LEN)\n", - " self.cost = tf.reduce_mean(self.loss)\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate=initial_learning_rate).minimize(self.cost)\n", - " self.decoded, self.log_prob = tf.nn.ctc_beam_search_decoder(self.logits,\n", - " self.SEQ_LEN,\n", - " merge_repeated=False)\n", - " self.dense_decoded = tf.sparse_tensor_to_dense(self.decoded[0], default_value=-1)\n", - " \n", - " \n", - " def conv2d(self, x, name, filter_size, channel_in, channel_out, strides):\n", - " with tf.variable_scope(name):\n", - " kernel = tf.get_variable(name='W',\n", - " shape=[filter_size, filter_size, channel_in, channel_out],\n", - " dtype=tf.float32,\n", - " initializer=tf.contrib.layers.xavier_initializer())\n", - " b = tf.get_variable(name='b',\n", - " shape=[channel_out,],\n", - " dtype=tf.float32,\n", - " initializer=tf.constant_initializer())\n", - " return tf.nn.conv2d(x, kernel, [1, strides, strides, 1], padding='SAME') + b\n", - " \n", - " def batch_norm(self, name, x):\n", - " with tf.variable_scope(name):\n", - " params_shape = [x.get_shape()[-1]]\n", - " beta = tf.get_variable('beta', params_shape, tf.float32,\n", - " initializer=tf.constant_initializer(0.0, tf.float32))\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32,\n", - " initializer=tf.constant_initializer(1.0, tf.float32))\n", - " mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')\n", - " x_bn = tf.nn.batch_normalization(x, mean, variance, beta, gamma, 0.001)\n", - " x_bn.set_shape(x.get_shape())\n", - " return x_bn\n", - " \n", - " def leaky_relu(self, x, leak=0):\n", - " return tf.where(tf.less(x, 0.0), leak * x, x, name='leaky_relu')\n", - " \n", - " def max_pool(self, x, size, strides):\n", - " return tf.nn.max_pool(x, \n", - " ksize=[1, size, size, 1],\n", - " strides=[1, strides, strides, 1],\n", - " padding='SAME',\n", - " name='max_pool')" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def accuracy_calculation(original_seq, decoded_seq, ignore_value=-1):\n", - " count = 0\n", - " for i, origin_label in enumerate(original_seq):\n", - " decoded_label = [j for j in decoded_seq[i] if j != ignore_value]\n", - " if origin_label == decoded_label:\n", - " count += 1\n", - " return count * 1.0 / len(original_seq)\n", - "\n", - "def sparse_tuple_from_label(sequences, dtype=np.int32):\n", - " indices, values = [], []\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model()\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch 100, avg loss 2.157805, avg acc 0.400000, time taken 0.350654 s\n", - "epoch 200, avg loss 0.076817, avg acc 1.000000, time taken 0.347520 s\n", - "epoch 300, avg loss 0.027402, avg acc 1.000000, time taken 0.349331 s\n", - "epoch 400, avg loss 0.022630, avg acc 1.000000, time taken 0.344544 s\n", - "epoch 500, avg loss 0.182347, avg acc 0.933333, time taken 0.345036 s\n" - ] - } - ], - "source": [ - "for i in range(epoch):\n", - " total_lost, total_acc = 0, 0\n", - " last_time = time.time()\n", - " for k in range(0, (len(images)//batch_size)*batch_size, batch_size):\n", - " batch_x = np.zeros((batch_size,image_height, image_width, image_channel))\n", - " batch_label = []\n", - " for n in range(batch_size):\n", - " batch_x[n] = (cv2.imread(directory+images[k+n], 0).astype(np.float32)/255.).reshape((60,180,1))\n", - " batch_label.append([SPACE_INDEX if labels[0] == SPACE_TOKEN else encode_maps[c] for c in labels[k+n]])\n", - " batch_len = np.asarray([max_stepsize for _ in [1]*batch_size], dtype=np.int64)\n", - " batch_y = sparse_tuple_from_label(batch_label)\n", - " feed = {model.X: batch_x,\n", - " model.Y: batch_y,\n", - " model.SEQ_LEN: batch_len}\n", - " decoded, loss, _ = sess.run([model.dense_decoded,model.cost,model.optimizer],\n", - " feed_dict = feed)\n", - " acc = accuracy_calculation(batch_label, decoded,ignore_value=-1)\n", - " total_lost += loss\n", - " total_acc += acc\n", - " total_lost /= (len(images)//batch_size)\n", - " total_acc /= (len(images)//batch_size)\n", - " if (i+1) % 100 == 0:\n", - " print('epoch %d, avg loss %f, avg acc %f, time taken %f s'%(i+1,total_lost,total_acc,time.time()-last_time))" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAACeCAYAAAAiy/EDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJztvXl8ZFd55/09tau0q6Xe3a1ut+22jWPTNmBjGwwJYXcIExgIL4EAQ5JJmJBkhmV435B18pJ534TMZ5IJBEiAISaEPcRgwMYBJ8YrbRsv3bab3jeptatUUi1n/niepzaVpNKuVp/v59MfdW33nnvurVu/86zOe08gEAgEzn8iqz2AQCAQCCwN4YYeCAQC64RwQw8EAoF1QrihBwKBwDoh3NADgUBgnRBu6IFAILBOCDf0wHmFc+5PnHPvXcbtb3LOPemcSy7XPgKB5SLc0APnDc65HuCXgI/Vee13nXPeOfczNc/f4px7e81zvc65251zg8650865/+mciwF4788A3wPevYDxOefcHznnTjjnhp1zdzvnrpzvdgKBhRJu6IHzibcDt3vvJyqfdM5dDLwBOFXx3Oucc++uePx659yv6MO/As4CW4BrgBcD/7Fik58DfoU66I/B4RnG9wbgHcDNQBdwL/DZBo8tEFg04YYeOJ94JfAvdZ7/S+D9wFTFc18D8sB/A94LXAz8rb62C/iC9z7rvT8NfAuoVNL3AbudczvnOb5dwD3e+0Pe+wLwv4Er5rmNQGDBhBt64HziKuBA5RPOuTcAk9772+u8v7KuRbHi8UeBNznn0s65bcgPxbdKH/I+DzwDXD3P8X0euNg5d6lzLg68rXK7gcByE1vtAQQC86ADGLUHzrlWRIG/rM57XwckgA8BO4ERxBzyMeD7iI18BIgCnwa+WvP5Ud3ffDgF3IP86BSAY8BL57mNQGDBBIUeOJ8YBForHv8e8Fnv/eHaN3rvv+K9/xiqyr33X/bef8w5F0FU85eBZqAb6AQ+UrOJVmAIwDn3i865IefcEPAosMMe678d+pnfBZ4HXASkgN8H7nLOpRd/6IHA3LhQbTFwvuCc+y7wt977z+nj/cB2xFYO0AMMAx/x3tfeoG0b3UAf0OG9H9bnXgf8kff+Ofo4hqj3y733R2o+3wvc7b3vrbPtbwDf8d7/RcVzQ8DPeO8fXOBhBwINExR64HzidiQixfhp4DlIpMo1wEkkOuUvZ9qA974f+Anwa865mHOuA7F1P1rxtucDh2tv5g3wAPAGjWWPOOfeCsQRe3wgsOwEG3rgfOIzwH7nXJP3fsJ7f67yRedcARj03o/NsZ3XI47R9yO27ruA36p4/S3AXy9gfB8BNgL7EXPOM8C/894PLWBbgcC8CSaXwHmFc+6/AWe99x9dpu1vREIjn+u9zy7HPgKB5SLc0AOBQGCdEGzogUAgsE5Y1A3dOfcK59wB59wzzrkPLNWgAoFAIDB/Fmxycc5FgYNIUsdxxMP/Zu/9E0s3vEAgEAg0ymKiXJ4PPOO9PwTgnPs88HPAjDf07q6o770ovohdBgIL57HhbgBcVETMc1rOzfb2wApx8NH6eVeX/lRmhUeydnno0cl+733PXO9bzA19G5LabBwHXlD7Jq14926AHdti3H/HRYvYZSCwcHbd/i4Akm2TANx/UyiEuBZ4+dZr6j5/xx37V3gka5folmcayolY9jh07/3HgY8DXHd1KoTUBFacb2dkVdizZRiAiSl5/CvHbwDgQ5u/w45Yy+oM7gLnockpLn4gBcA9t+0D4KY3P1x6DeDaZGJ1Bnceshin6AmkZoWxXZ8LBAKBwCqwGIX+AHCJc24XciN/E/CLSzKqdULOF+gvVPViIO4cAN3R5tUY0gVJwhWqHo+PiiLsy4oq7ysk2BFypleFa5MJ6P4+AO98z/envxaYFwu+jL33eefcbwB3ICVIP+W9f3zJRhYIBAKBebEoXaJNBeo1FrigOVsYB+B0IcoPMpcD8OT4VgB+ved7AHRHV2dsFyLpiDhBUzEtyqienP4JUehZH+T5ahKU+NIRruQlYLgoZpX+gizt90/Kzfuvjt5C35iYVra2jQBwpktuIpdTXOFRXrg0O7mRX9LeB8CJPulb0Tci5+JwrodL4kcB2BhMYYHzmJD6HwgEAuuENa/Qx4r1C961RFIrPJKZGVBl/tfnbgbgrhOXADAymiY/IVM8EM/X/3Bg2emJympoU1JWSclUDoBsRpb6nzh6E9dc8nkANgZTWOA8Jij0QCAQWCeseYXeVxBl+7Wx5wDwwvTTAOyOja966F/OizI/VhBb7OPDWwAYHJTHPluWe/mC/HZOeXsu2NBXirSTOX9R6wEAvpUQR/XEkKzyhjJNHM53ArArJio+HQmOusD5R1DogUAgsE5YswrdbOdH8m0A3N1/GQDf9aKuPrjjnwEJD5yPUrd04loWEjplSUMPT8jq4fC5LgD8lP5OVhQ68N7Ne/urzVLO1Wpi/padsUEAdndKUa4fjUhRqPGJBD/K9AKwLyG9nINCn04j10PBy8pzUCO/MlrNNaffhfaI079yTuIuOC2WkqDQA4FAYJ2wZhV6Vu3T3xsTRX50WGKHcwX5Rf/v7pX8yc6vAI0n6Tw0OcUn+18ETC8EZOnH81GffUWZvn8+fZWMWaMmKExX486dP3XJTInNNlfnm0oH2BqT8/J/bf4hAE/1bQIgM5bkhwO7ALi1TSr8bVmF8a1VGrkeAPbEC6WIr0emNgPwzUH5bkT1+r80fRqAvclTAKRcjp6orLQtGinkAiycoNADgUBgnbBmFLrZ3sa8pGkfyDUB8Oy41HQfyyQByKkKHkhPcKbQWNZlpcJ44sOiGLqQWGRTG7WFgWbDMkOfnhKFd2ZUo1qmZv59jEXXflRLrRJbirkyKguV2cpmtFit8jdHpaGB2VmLwLma1U6PNqdYSIRTe0Suqa1qS9/RKX+fGtnCkUGJcjm0TZpg7Iz1VX3mQqHSTv5QtheA/WM7gOnXw7/+vVwPr/51WdWcKxS4rf96AJ7VZiID4+KnmJqSZfT3ExcD0J6WayFXiLK9dQiAP9zxdSDkAiyGoNADgUBgnbBmFHpRQ0JO5uXvPw4+H4Anz20EID8ZszcCMD41/1Z299y2r6QujJIdcB5YzZZ/7LsOgAldPdSznRsWh34+YEp8KebKopXOFPLcmbkUgK+elg41I5MS6dCS0OJZUck5+MUtYuPO+RifPSGKLx0T5firW+8G4KqEqOvWiFwX88kc7tFiXV1JWRHEknkm9fr6p3Mytn1bvwVA+wqdNstpqGWlokAqV2d2/ntedRyA0c9uAyBNdbbzvn//GEApQuibJ6/g1Bnxdflx/b4m9Qur3w27osbRtnMRz9iEfH++s+FyffVJ2X+wqc+b8+cuEwgEAoFZWTMK3Rj1orxHc6K4Mln59fZ5/e3RX3rv3byzLm9688PTPPTvXEB0y+mCqIszmVaAUr2W9YLNzWLmqlKZA3xy4IXcdUoU+oDGf+ezcq69ntNoSlTq7/W9Vp73jnxOznEsLq99tPAyAG7d9AgAr289CECTL18DUTe7TumKyjbfvFFWAk/0b2JoSFTg4JTYzPsKcowr1fhi0ud0vzJfPVHZ8XIr9Hp+E1uZjZ4TZd76VmlE1nf7dqB8Pbxpg8zf98f2AnDyyAZ6JfCMM9fJ+Cc3yHkppuuvQCi4UnTYx5+8CYDvdlfmmkDaybW0luo3rVXWzJ2ovOSUL/dEQb/sFu1nf9UpFosWp3WimYnSDaj7+4vqimLO0L6COGqHJ/QCm8XUYkxMJvSzkijVX5ByratdvqASm4u+zh8BcP27n6l6fWtUvvyZit/P2gQcc24PFOXGdNuwmKX2D20v3chz4/oZO6c6f4UxuRwnxiouS53afEyee6YgzrbvJ+THoSc2CsAlibMA7Ix52t3sjkxzdG6OSpr/7s5zpSQj+5Fe7hrpVjO/T01xVnqgTxPprkqKuWNnbP7Jcwuhnpntut+Q6+CaFrlWr33PF6s+Y3O0f1hu9Hv+Pk+8X8xYW3Jamvg1c5tG/YT8aE1MyHk5oMEGfx79WQDet/2bgJT7gLX1nVlrBJNLIBAIrBPWjEIf0yXnwxOivA4NbwBgKlOtAF1CFGBrcrLUiQYac5AuNhnGnKFfOPs8AEbHGg9pm5yQMf7d8RcCcM3FosDWUueiflWNaf2Z/8LplwIQczLnG7aNAdAayXJ1YmL6Biowk8WDgzsBONS3oezYthVNI7lW9p6cOtX0enj4hKjCp9Rp/sIthwH45e4fcFm8sSV6j6442uNZouqAy+ZkjOPFpO24gUHOH1Pmf3ziVQAcGRWFbgk421sklO//2SZmh+W+TirNbDOb2Kq/P9/SYIBnb5Pv7KbMaGlJfep6NZVGZF5dk3x3XERe90U1nWaj064DM2E+ekIaxfxB4VYAfn/n1wBIuWwwv8xAUOiBQCCwTlgzCt0Uy92DUoTL7K3mMDNbaiwhttnL28+UWos1qtAXQ8EXS85QU1PTnKFq36c4XYFaKd3RSVEufQWzAy5vwpGp7lrq2SHP6Dn406OvBODgafEVRPW4PumkgcebNt/P3nj97U54Ub19hXbZ/4TsZyqTmF60zObLXBCNKHdV6pODotAm4zKf382KSsz5CP9p410A7IzNrtS7NOTxVV2Pcu/xXgCKxZXROHb+7Vo63S/zFVEFG1Vl22jy3EKp9C9BddLYTCtau6aeyO4GYGSPjK3rQIJTL1cfxCZR5JEuuR66O8XXcXW3OFh/1CcrrL7T7ZCv8UHpY/O1HB6QonefbBbH7a/3fG/Oc3uhEhR6IBAIrBPWjEIfV4/5uawol/yUDs1EnNrON3aIHXdzcriUeLBS5HSMhWJ9pRlJiiopWmOLyugXfe9UXl5brkYX/aXoCdn3gZxEDHx7SEr8vr37B4DYIZucKCAL8zPVeHZcVGFuTF4vaDhhViOPssU4yRlCAzMarXTv+B55rAlgvujK82WLrrgcezJdbaeOa7u+XC5W8j1MK6tgqyBT7GOi1B88cxF/FxM/xXv0WFtmkC2m7jZEx2hKipK0aKShoqzG+gsngaWJrLBIrsFilhN5SacftbDcQrncAcBUYWWdK/MqSqdj/f45abVYbJPz9ZNb49gR1Crz6zcdBuCK9Mmqbd2bizE+Judh2vdGr5cJ9Zvcd1rmrCdxHb/aeR8w87m9UAnTEQgEAuuENaPQp1GstquZ7by7SRToS1qeKLUWWwnG/CTnCpLWnKkpO5BslWgbs39a7E1xvDy9TlVuiyrBbdrqDJa2+NNwUcbw0bM/A5RtlamYJvh4tUNu/B47YzLSyaKoqpJqzFTH11tSV0QjMC5OnCXOPOfeU7aVx2Q7psy3dg4DcFPPswA8N30EkEiTTx27UcakvgeLQhkbUh/LpGoStbsODrawPy3HfLhDVhpdkbntrSldFYyMynb//pSUHLhqniWaZ8MU+v7JDj538gVyPBOiPn1OjsOrW2hwTMZxQuPTzxaOrpkU+IGizOPwlPx1ej59U6F0bpvScm3tbpdmInubpFzuzU2S23Bjk5zrV3b28Hcn5Rwf7BOfTVZXW3ZuTbmPJ+T54xOdZDqW5dDOe4JCDwQCgXXCmlHoUTWYtWoMMdYQwmlauNqpNyRFoXdEJmlZgdKm5VZ4js+drlZVpjjTKVHdz9ssGXXfPyL24+z49Ol93bZHlmWcZjvfPymxu6bM+/sk8zCRljGm4xIx8KnoTfxm990AjKpv4BNHJfXa7NaG2XeHsuX5nqlFm62abmgWJfbN6BW6kfJ7LBa5vVli2d9x0b8C8NImUeZWcCvjC1yzR+L1H8peBMDt535KjnNK0tInCxovni+vJgYyMs47R68E4JI57K2bomO0J7VUgZdok9PjEq3Rp7b0y+t/dF4MF+UcHJraxelR2X7O5rom29jS4e2cXHPJ56eVlbXM5Ulf7YfpXKb2bvZdyHmZk5IvKVI+ubGULDGu2iiK/N2b/wWAy+KyIq0tprY91sdujS//dIv4Pm4/JOctm68ueldQ/1PpHhGYRlDogUAgsE5YMwo97mqiPWqaKqdTYpOzX+fWSGN1XBaLtcL7+si19E+ITdYyHp2uGja2SOTNRSkp6ZpKiG04S4XNVg/nn09LtMnL9zy+pOO0yINPnhBFNzCkTTf0ecu4PaKNrLtT4xzISTbuOY11Pj0kat5qa5TQbZjvQKJ96kfnmPJKuZoMywqfSCwpKm5ri9jO9yWPAbAxmq76SJODTlV/aXcYgFNtYlMemhQVfqgox5AbUTVXhNFxGcOBMYnw6WuX8zVTW7meqOed2+4B4MODUhhsRP0Ix3SOzsaP6hgXbsfOqJC9b2Q3WfPDzBDkZHkLIxoFcyLfxpWJ6jm1dm9fG5Nr6pqUjPE6LQu81Aq9oMushyd6ZftR/Q7auY16Wprl+/mWTfcC8FxdUbc4Obe1hdPaI03s0dLIP9cpGaqPb5AzdWB0i+1YdqP7mSrGyPo1lGK9hpjzhu6cuwj4DLAJWTh/3Hv/F865LuAfgF7gMPBG7/3gQgdiJyiTVyeRnjy7abbpDf31ndKV3ZJClhtzMuZ8lL6RlqqxRTVM0ep4x2cpFraciUVjxSyH8nKjPjMmYyxMVIeA2Y09F5d5G5hM8+yUpM3fdlRKGdSaWgyXrHboSg/IeZq7KkwuVjnxTZvv1+1pMk29UEg1J1j1wVtbxWSV0+vlxLCYSHJmAiq60g+uFdqq7YxUS3e0mc0xSbXf3iF/D52RImC1pRo2aMLPXBUd6xEp/fUlB/pc+AphU1vQ6wFN7PnhkPxNdcoNf3f8ALD0IX1m2rGSG0NanK70XY0V2dYuP9JmQk3XhMbWw8x3GyJiQmqJT5a2B+DV1KLWVw6NbgDxn5aKwS3kfKxHGpmFPPA73vsrgOuBX3fOXQF8ALjTe38JcKc+DgQCgcAqMafM9d6fAk7p/0edc08C24CfA27Rt30auBt4/0IHkrU66FPlpbOMULuFt0t51J6ILCdbItXL8+Viystv3oGxTaXa3KY24xpKuT0tqq4zVj8dvvIzhZpwzKUg4wvcOyZJHlP52U9pUcPjRiZTfKdPHJa2rJ9malESTaL8fqVX0sLnk9BVOl5HaQ7MJNWh57JzFue2Ka8WJ2rwIk353qhOtoSGY1aWW7AQQEvOyZVK4c487q3az9S6GB3S5/vHZCX1dE4k4a7YAFBWnvPB6rC/dsN+7j+1o3rctdQR8KbMP3xMzEIHz8mYLOQSpGzGzelnpn12KTCHdyoi5y9u10GpxLUjr45SK26W8fLdmKukMUBKV7gW+JDU6y6rSWVWSnt36znOFW17a79X70oyr3WKc64XeC5wH7BJb/YApxGTTL3PvNs596Bz7sG+cytj9w4EAoELkYYN0c65FuBLwHu99yPOlZWF99475+oaBb33Hwc+DnDd1akZDYdFVcKlUKga5ZIvijoosvQKtx7ljjtikz6XbaYwVa1gW9Pynld2il03O4etFso20UZS/ys7sFdSm6Y9XvSlhiD5/AzOIp35kpO0EKW3RZI+LKFjJiwss0cbQszHMRivo+bNIXhIbfh7E2Lz7WqgP2hENUhvvA+AqNmirdBXRaGnwjwKbZl6fl2POOb2n5KwyHEtEfvg+C4A9iVPA7BjhrDN2bDGGh2RDE26Shmbx+fN73J0RBzDowPyeDyhhbD0K/hUj2irnuipJU1GSjo5b73xfgCa4uqktcVR0ZV8Gne2XlH13kZLGgNsTsp1VloRK9GYHOfQVBMdEQtdXFxJ7PVGQ1e8cy6O3Mw/573/sj59xjm3RV/fApxdniEGAoFAoBEaiXJxwCeBJ733f1bx0teBtwH/r/792mIGEqkNWzTlpUr90KiEj2U3rUy40qi2UPvemKSU9I81l9StRd60p0QlWCuzc4gammGxApTVqRV/Gi6K4muvsCPX9nms7e1JI709Xc3fmnZvYxPJUt/W4kx23LhGKpjNW+3MjZQrjmtp45hGhRDxpX1PTcrn/+GEtKfr3yTRKD/fJi3PNkXFhppy0VKonGGRFjktPdCc0HK9lW+zKDrd97Rrqw6V6hkgqXbprJZwfXBAbN5v7HgAgB1zbnFmpny0YV+KFQv7cfai0nGMTVT7mSw1/pyGqv7NcSlzvHf3F6clIy0GC4O0KJekRnc5XYX5XITMuIztnhO7qj77nh4pabxHI5wiepKiLlKKVLFIt8eGJTmukKvRm7q6zRZi5HRFH6JbqmnE5HIj8FbgMefcfn3uvyI38i84594JHAHeuDxDDAQCgUAjNBLlcg/MaLj+6aUaSFTlRskuZ/uvSYkurJANfUjtr8+Oi315fCJRUf5V24Q1D9X9rPczjzGnZYG/3HctAM/b/nUA2lVoPDQ5VdWBHcqNe02p1za6bo04NiakTKnZ9QsWd24t/HLVJUkns3F+cFjilwsz2N0japu9YeNPZNulZKFGFLp8tk1T6k9EKCWIFFRRWiLTgwlpU9evjYVf0f4YAPuSAxzWvIT9WXnPtrhEmXx94LkAnBlunbZvW0Gl9VpKzWPc1njcVllWmCyTs7K6S9NQoXSNzBGObrkBf/PkjaUxTWYtKal6G/mMXFtWokFs7ksfBWLRKGlNCIpoJFohD0Udw5hGoTzZLPb8x9olSWhrTFakLS5JLeYjOD4qlbcsWsmO0/IX0rEcXRHzLwUbeiVhvRIIBALrhDWU+i+/viWbq4Uvq9rq1PjgaEOdhReO2fNGNS4+k9fokWxZ3aWaRR3kNPLG1MK5wtw29EYwJd5V06C4ZEOvIeWivKb1UQD6t4jKvbMgLdmGdNy+ZmVTnIyWVj9+sr5Ct6iCWsaKczfptfaAJRWXKFDMVTeJnlT79BOnRMUNdIpfoTsusR/jxSTPTMprT41vBuDpISlre25U5jo7avbkcrx7RBtnbGrSVcs8FLrZqS0r1lKfx9SWbdFYKxX/bBnG2Wx0ml9p2ldBn89pvPq4T5Apin9npmJqCyGux76jWWbnmWbJqh3NNpejqVRdWxz/A2OyGmxW+3tvTD4bqTiIxycljr62fLNdupZd+/PdD5eikgLVBIUeCAQC64Q1o9CNqKtR6FrPIR2z7MKVsZ1lNNPN6oH4ivhmi4C4qeNpAJoj8lqt/bUell36+p6HgHL8s8W9jxbT/NQvPAHAo1+UWF5T5u+cIbqlJZJilyrhV7dLTPx9fb0A1LfyC742imDaG+S4TkyITfPro1cD0lzkoqio6O6o2EprC0F1afG0K9sk9+xA80bGbCVgpW41AzCvynlgXBT6N0/KcX+j+JyS2pzMafs7LV1cqNfmD2lV2NoiNUFu6JAmChuija+YLL65JVFdT8SiTazhRH/hqB7//OO8t8VGSiuAoSaZp5mydKva9s2kzGvIaYbsjzK9vCAp11l6Ed8Xuzb7CnLtHsqLIh/OzZL9qedlfETU9j89IwXE7m2T6JftrXJlXtZyhsdHxL5utvOZagqldc42RMeqosICZYJCDwQCgXXCmlHoVj3P7NKmQsxu1puWrMb4Mge5jHlRZtZuzlqBUXCluOxWrfx4ReoEUG4o0KPx0x1NomgGm1qmKS9T9xbvnFOb/TNqq9+f3cEv9EhFyd96z7erPjtb3LnZSC/S1nY7WsVGaao3o0qp1Gw576bHqNdQUHV8ZkJWKffmxQ762Og2rms/DMBrW34MSKQNlLNI23U8L2l5EoB/a9vNIY1RLvpqdW1jGh9sqvoLZYVsWNRJPWUOkGqZZN8mqYx4ZVLOz3xaFXZE5Pzs65SSvk+fkmzWKa3geNup58vrvfL6QlrT9USLpbo4fzj8KgAmZ1LoxvK6jupiTVMOaaTR3/XfAsCTQ+LXGNJGIhktV1w1RrOl60pqUv+eUJ/H2Wbx9TxULEf05yeqfSy1LQsT0VA6ZC7WzA3dyNV0O3elcqXyd2631gL3q3XPj2udo9vOyBd3ssIZGteuP5e0S9q5mR3irqVqW6/eLDe5/3nyljn3a3XM//+TLwegK5Hh5vRBYH6d2I12vbH+2mZJ5PhcXMq/3vUTKd41OVXhzJzjJlFUE8nh05LU1dEuX/B4tEi2IJfO/UOyhP7QttsBSoks9gOz035gmgc5lZYwxUxef1xqurtXpu2XhliYwxSh5pSkOqq3dQ7TFBXznDk4a5OTZsPKD+xJnZHtpmRb4/3ywzhoIYGL6GS0MdpcKqNgZRUmWYJwSBU/FlhweeokqQXURDcTi93I/+DIrfK4X5P71Jk97ce1kWnWc1yqXz/b5/RGbuGzyZiVqc5zPhsXGi3psRDO31kJBAKBQBVrRqEX9LelWJOUU1SFdnpS1J0l/MzUfWah9BfEkXbHmPSsfPy07MHSqol6mpvkl/X13WISsaYLtfzTKUkIqqc8JjTt/cfaI/PpiITjPTMkjqb2VJbshoWfFnPS5bysHrYmxfnUaEOFKlRNFTXNfvCcmF4S6SkGhmU/27pl+8fyYqK6NJ6p2oSZYt7Q/QBHx8Wh+OzY9KSSGWlw2JawNTqZZGBK1PMTWSmwtTd+sOHdWQGqzTFp1BCxMFodR1ZDL1c6fHFWahY2CQ03TUWmSsczH6zsxWf6XwzAs31ybWaHa0JEF2MGqvdZOw69ViPaWGVzt5yLPW2yMt4czQAttZ9e8zRS0mOxKj0o9EAgEFgnrBmFXmpcoeFi0ZT8OhfUjntwWJxToz3LY0XvK8pUfLdvLwCTmepu7C5VYEub2D2tGFdtcs2JvKwihq01V3a6/XJKbfIff+JGoKycTWHGo4WGGjLMhNk/Txbkl/7ZjJQuyJu905R6YR7eZQsz1IdTFWFlljhyOi9lU380JSuDKzRN25pXdEXHaE/IKsjpGPwcTtmG0OOwxh1j2SRnErKSOJoWm29f2ppkiEKaLcnGwi8jOvfxGkdcUVeIs5U/nslGulNDb/sKjqenJInGwiEXhCXcpEVRm72/t13KI/RER1mIZrPvQv+UnNva78JM45DB1CQ/GbOdY7WVO00IS7fIPaCrWe4JL9ksK6w3tsvKuPs8SyqqVeaNlvRYCEGhBwKBwDphzSh0S/5467YfAvDYzQbbAAAgAElEQVTHA68EYGJchmht0izhZ6lsl8NFUY2HtMXY6VFNJLKkGxUasUSBy9uksNBiCgNZGOO0MDXdz3CqibMFGUN/4SQwd/JKwRcpqgSy5I+vDN8AwNND2rxiloJhDVNqkhGpKvIF8LkTLwDgvTu/Iy/ExbZuYaADhQ5Oj7fp5xu0wdZTfkaNHdcSvzIjKU5quOWjMbWhN4m/YnvLsTl2WMaSxKI1+zUfSF9BjuVs4WgpVHMuG+mNbZKI9v3hS3nwjIyptonDQkgk5Zy/+/J7AHhZs4SK9kSLRObbzJtyU+2TY7LqKkWzGPpddRaFoso61TRVCnWNqR1/fLhJt6Hnq1blu7Iy7+iQKKqtuhL+je0SqXVFQkKWu3VltZRlDFaS+Zb0WAhBoQcCgcA6Yc0odFM5Fp9rjYQnfLXCaKR123wYKIiS+No5Kcc6lqnx5KtC62jLcFWzJKzUFgayBIy+giRJLMYums9H+MKZ5wFw/c6vNvSZIp6jeVlp7J+U5gA/1iYBlvyRt/Z5i2lSbeqqQEml5dSePqGp+c1qp7boiuM6rq8PPrfcjLpW8Rk1iSSRRKGkPqPaRMF8AeaLKCVKFS1JyZEtyH4OObGhP9YmavjalCj0eGxuW7oliVkTk35N0bf9fuLoTQBcc8nnOZZvzEZ65bsk0enIWBcjoxKJk8/KV9Cpv6W2F0cxPXMyjVM/09ZOiQIxZX5lYuFp8f2FcU7k5drp1wJopfOu5zymNvtNXdqSMC1F0DY3jXJ9qzSo/vaApPofTMgKsV8jpEp+pYqFj10PVjLjjZulici+pPgCNkbPv4iWepgSr125zVTSYyEEhR4IBALrhDWj0OditqYRC8GiQY5oZMrxcYmjzk9WT4nZCVuTk/Ro1mOcaoVu2Z6fPSGlXS1iZV5U2Go3p0TxjM/Q5LhcyEuU0kAxyv5JaQDxd8clM/TUsByXFbPyattcshTyQv0QFcvOtGiRAbXHns22lMYybQw1yjzVKnb3rZ3D3NAtzTUsU/jAmKSdH9NCTgNWRtcyD/OuFJVjGY0/OH0xABenpO3t9rZn5jy8Hl0RvH37vwHwh+c0RX9QIphstdFXaKZVC3rV2kijWS0ze6scw9MTMvbRqWTpenZjMk+ps5pf8UM59lPXy/YnJQScYrpQUuSJJtn+lk65Ht/f+00ANtVpyD1fzhQifOq4rD6ymWrFaOUVSsXPtPHJazukdWBvrNzy+pqUrGb/PPIyAB7QVev4lK4eKv0oOuyUlsVoVh9VywJi6NciJeWtSrw2mmUplLkRFHogEAisE9acQrfoghJWp2eRTSNqqdcEGiqUrI1H67fsaBlkb7wfgLgTlXFWbeePTYnN8cSwRAUU5iq0VA89znRqiotSFkdc/5gzWnfmjowUy3o8s42fjIu92JR5RosglY5nPuItuoBY4jnI5BPliI6a7Zrya24X5ff8LVKa9m0b72Gb1suxszLaLpfs/ux2AD57QqJ5jqOlVzPxUoSSNe4YHpfzdXyqCyj7TdpnkTPm09mmjRhKNVecKHTzk5wttJLVZii1NlJT5jFdXTw2KNfJyESKwoh8pqlPxtj7lf6q/W+RYC8Ov6asUk2Zv/uKfwXg5S2PA+XVxEJK+dbSV2hmYELs+2bvdpqx2dwqK5HeDrk+n9dyCICrNb+gPVK2dXfpKvLXNkmkyvHx18vxqA8iP6bHVXFNjeqqx9pMZrwc72JK/64lllKJz8Sau6EbTeoUHVRTREZP9pAWReovnFzQBWwdiQa0quPTY5KwNGYV4/LVDiD7Ir+m6xGmNN37YE6e2z8pTtA/OyitVcdHazqtNIAto60L0t4NZ3llq/TUTM+w5OzTm/TjGQnLe3jgIkYnZX5KJpaafozlHerfWVKvzczkNPTMnFZV4YYVHYIA8kUr3VCdEl/ZA7bkDK3Zd1SLL928XW4Q5iS6LF6kJVLtELMiapujRwDYuFPMU59ISKf7g+d6GB0wZ55+Rk1gh8bFfpHpaPwH15yjm1tlP0NDmmyjN6ZPHb+JP9/9j1Xjfvt7fgDAAxPyg3tnvySrndIeqlOTcZz2eN3yb5N192sml8pf4qakfCeeo87dxTg/azFxciK/o9wxSLEfkhdvl/ryr+mUXvEWThivc8O1pDsrYGfhrH+SE9PVyZz2Dc1GS9eSXbtf7ZMfxGu3ixN5IRUtL1SCySUQCATWCWtOoZtTraTr9D+msj5/WsraXrvzK3SWSuvO/3dpSBOU+rKiAAtT08OpoKw8D2S3MK4OPlvmHzsnKsNS4eul+s+EKfPOLlEw77/sDgCuSpwqmVrSM/TtHNfSAPsHxexwrK+zFJY4U73w0oRWmnFqy9WqMjfFnGqqTmE3BZWfiJWcuFaioTUpStMULZrQMmsPWB1LU1o+u6dJnJZ7tGxAS52uNOZsNZPI1aoS97WLah2cTDM2osksauKxsgB2rsd945e9ORrfsvU+AP6oXxRmVp2jQ9kmzhRku7c0yXszRZm3k3kx14zm5L1TmpRUmIji1ClqStxMLGVnqGyrMmzRkpqszMLZwpGquVgMtur7xNGbSh2D7Brd2C7X6Js33AvA1Qk5viYnq+XZvn/W0crCWV+0SRzSXxyUMOFcNlr6zllAwqmMrGROF2T7F68P3+iKEBR6IBAIrBPWnEI3SkWRLJdFHVz9E6KGns11skXVYNo17myY8KIU+graqXzSkl3q270nNNHosweej1clUUpqWYjzUzG75G9feicAtzRJmv9sassSmI7lpLTvmVGZi1wmPl1t1yhyS68uUXRlm7gS1zF1aSML6+cZV5VqjumMdxR1f6auf/WifwHKDjrDbOi1ZZEBIqoAd3eKk+3FzU/J/mh8Xlu1IcW/b5femedyzRzrl5VTbsJWLepks76kvnHJ113jHG3SvpZZbUhR77iMgvoTJvNacjdXXj15PXYLSyw7P6crc8NWgp84Iv6Cay+TVcnGRdiY7Zp6XB37Z4ZbS9d1slMcm9Zd6ZK4OD+bNChgPivj5yTEB3FfTLYR01VYZRK8XY+WpDZetHIf+j0/T1P+V5Kg0AOBQGCdsOYUupXR7UzK3xOqGi3FPKtq53CuhxekRua9/QENV7x7RCIPBka0Z+gMKfHW5zA/Fl/Svo4WsWClDhqxgw4XZQDWHm/KEpjqRdWoMo9r9ExKS6ta+deJTKKcNm8NBVRdd6RERVl/12RU5mxri2yjv6mF0wNi59zWLmnnm2NDdY/DesVOFmLTQiedDvvqDklC6YhYSn7jqd4WTRFBPvvS9if4dlLObQ6NFDHlNyXX0GKaU9RGXxXqXDdm59+gER4FVfH1VoGzpfbXYj6aMQ2ZtHLNVyZyM35mLuyaMt9UZbGw2mt0Q2T+ytzmwiK29iSltZ8V76rCVsB5+cyDGpZ7TVLCQddL+OJyEhR6IBAIrBPWnkJXZflLW8Sj/gdDrwZg0BS6tgAbK6SY9PNTWGPFLE9NSRu0RwYkhjuXmSEdPV8dZz2rOp/JjDrLZxaSKJVVtZtX5VxVYqCmfZfZzBMasdKaElu3NeGemopSjFZH9hRVbUZ0bK/d8igAh7Ni6I2qot3RPMgjEZm/bWlR6K2lksLVkTnWrMPsopWYT8JyATQ3aEGYfTVOgYQ2E56JhRR4S0dk/kqd51UKTebipdyIsaIkB9U2PinRSGG0pWj6MQ8skmRYU/JzmfJ5spVFbh4+jZlIOrkOWiNqQ4/UmXtLc9D9DubTi97vhUZQ6IFAILBOaFihO+eiwIPACe/9a5xzu4DPAxuAh4C3eu/r996aB0fyGmubk1hbU5TWtsxioY9OdjGu9r9GBcSZQp4vDVwHwPE+UerTokNqqVRKtZEjMVEZMS3xmtS/VuQ/m0ksKhJmJq5oOwXAo0VRyUR8RXanxuarjbJFlXl7UiIWOjRN+8n8JoZsdVLKAJW/qai2MkuI4kw5eXyVZigWiPCq9keAcqOHnsjsqrjo3TSFWtTY+RGN0z6nUQ0Xz7ql+ljBsoxvKV0zJWrKR0wrL9EAKf3MtmZZkRyJSqmFyWy85I+5OnE3AC0qkywmPx2X+bNWccXxiq9dXH0dTdV2cLNl17t+lqJQXbnAm3zPLNqr8nqP6nzl5hG3PxNmd9+mBe7am2T/Q03N5WPUw4rrtfvTbU8AkHYhVbRR5qPQfxN4suLxR4A/997vAQaBdy7lwAKBQCAwPxr66XXObQdeDfwx8NvOOQe8FPhFfcungd8D/tdCB1Lbvuu+v5FMsvErVTKoAs2rUj+e6WC0QeVgsbZP5br5txO7ZDsZ/exMtsoamzSurH6t7ooppda0qI137JJSqxbn/LEnbiY7g0LPZEUdV7Yyg9mjXSym++FBadhgqxZinpQ21rVGEJ1pUeKXtPcB8IZuaRpw58gVABxJdjJU29VNI2BSGtVicdSvb5UmvfbrH3cRQLYfxzJDFxCBoPsfzIr91ubN6u3MJ5rCiq39YOTS0nHU+jaatDxrpLaLRAPErWl0pDo/Ip+N8XC/nI++LlG5u/QzFpP/Hy6SOO7fG3wtABOZ8nUb1cJXN+6SOjbbUhIt9MWD1wB1WhUuEVbg7d7xPUDZN1WJZUnbdWA5HC1uBh9BA9icvGuHtMv7w8FXlY7R6Ws72gb1vRK73jSPPJMLnUa/MR8F3kfZi7QBGPLe2zr7OLCt3gedc+92zj3onHuw79z8l7qBQCAQaIw5Ja5z7jXAWe/9Q865W+a7A+/9x4GPA1x3daquHn5ocmpa+67OCVGczWdliMdeoW/WbLuzmdZS1MdcWKztZ8+8sNxkYa6KiDUZlolUjg1tokZbEzI2U7IbkvL8i9PSBLio8u1rHVdzJFO/zksp66+ilRlQamdWiZXdNJXYkRR1vKFT4pwT0QJtaiNvS8jfzRqj/44NooRKNm61S9514pLy6sQaLVu0i47fIhKs0YBFKlQyl4q2iJJ68dq1q6PRomZfqupv5Oyamu8rytgGcs2lSogldCVjijNTykAUBdhIBqLNyUShZtt5x7jGt9c2MLfV1uaY2N27W+W4jo8lS1mRba2Sb7G3WRqQ70pKPZvbU7KSmmS6Gl6KUtLmf8oUtDpnHbt8XuPBfzgmXo2bm8R3s5iGcLWtJpuSudIxOmtxF7EmKQuv1XSh0ojN4kbgVufcq5CYtDbgL4AO51xMVfp24MRiBlLb7aWQlItp8G1y03I13cNHs0lO5sWxORwXs0J7nWJOUA7N6ptoKfVwnBbmV+NUbNI66O1quuhpGufiFtnPazukfKiFsm3Qv+0RTXPXO9W7dtzDHw5qp5uaG7rd4Icn5GL+xuhPAXAku2Faz0HrdLJTx/jbW6WQ1/fa5Etf8BH2pQ/LOHWZ2qVhhNb/NKqn2sw29b7AdpM5l5UvXUdUbjaWHLKQL5Y5IKORmW9CttwfKsh+h4tScKsz0tTwPq1T/dGxznKhNUMP1dL0n5qU0gkv0s5QjdCqppbmqMxrKXU9G5312KCcLNfdJNfyWFei9ONy6Qa5pqxk8rG8xG6Wbtp1whitFruZ6/rVXDefctLNeq12xmRsljA1VnFZWFjsSF6d1np9LKTUgP3wnitO6Nit/275BzKeENFh9eNb3eyO9sB05vy2eO8/6L3f7r3vBd4E3OW9fwvwPeAX9G1vA762bKMMBAKBwJwsJh7p/cDnnXN/BPwI+ORiBlLb7eWGN0mfwpdp954vHbkagIFTEmY1nknytXPiOL16i/RUnK0DDYhCi6W0i7yGRlkDi4Q+3twsqm17WpxTVsy/KzpGT8RKxMq0lRNI6hd76omOlDvd1Fk6AwwNyAL2S1FxgqX/smNat3jrQWgKLOXErJJrPgBIGGG5a4ytUqrNCNbA4O4R6dA0LbQPSmGLpmSHCpbYka079vkQjxbKDuYac9ekJh1985yY2zo03G9vvJ/uqKWOy/FEVLKa0jutx3FfRlYrfWPN01LsI8nqEr/7mg7Pe/xdWgTsVV2SbHXv8V5AikvlLUzVm+ml2pRTmyz3GW7gZ7vF9HVzWsrJWpnegUKNya3OnBX0/186I9fHzb1H5n08Fgr4wmYxE96ZvAyA/qgvzZ8VEzs0KollZzbItXpRUVW9npPZVlGmzM2h+nROrs+PHRYT61SFeSyvK7UbOqSRRtYHU8t8mdcN3Xt/N3C3/v8Q8PylH1IgEAgEFsKaSP2/NpmYsSP2uBcV8M/xK+UJVSy5iTiHR6VH5MNdUvoz5aQEbbsqI1NIXdqVfUt6BHpkM9dtELtjUp2FpkrbtbznL7TJCmFTlRpvrOyqqeG+wo5SeOJcxD6zQf9XjgQq2dBrSKqT0hoNAETmcCGOFi1JRMsH5OskrBSrFXpRLXLW9i2+gASPTVqgqjUxSbRJtlMYq77szIn5yBk5j0NTLwGgt3mA5zRL4a7dCXEWmk3+cE6CAz99TJqNnNb2bpMT8VICk/UqbWuVc/qSjRJ+ad3pF1IEzAputTSJ2p+gqWQPN9v8Tanhqs/ayuqyuBzDCzoPlxzol8arV26WjNTTLH+Hm+SzhYpkJK/729Usvobx4vyVrB1PV7S6GN6ReJGCFm0zpW59aj/ddyMA8Y1SKvk5GhzQ7mZuhWf+pL6CfM/uGZdV2MC49i2tCMs0p/yXT8rK+5Y9B+d9XBc6YU0TCAQC64Q1odBh5o7Y/5pVj7eqgdJPUBFOaCODv42LcihslcSel6dPV23DbJi/s/UOfjghIVgXq+IzRjRk7vqUBOuYzXTGQkuzUNnOa6o2hK4GUyhDbxDll7yrjRt+WZS5NR2unRtTyvNRzDZt0QYKUllxrogl0ywi9drm/j9s/wG/PyKF1kbHVRmrkrYkr3G12R7MSrGuk81tPBiTpB1bNVj4o0V6mLovViaKlZpdy/gv65Zz/ZIWsVt3L6JRgiUllYp0US5HYWV5rURzS41csrl4p0ZJwXT7s9nS37Fdwk1/b0CTkcbLbf+SWgq5JSbfiY56ha4axFoEtsdlFRuL5ylE9HyrrX58SBT4fcWdAAxPvRyAt2yWtnx7E2dKY++sKbE7qCUGDuRkBXpoQuzx00JLA0tCUOiBQCCwTlgzCn0mNquNrzslNsXDmuI+OZoslfo8MSyRL493StPka5OisltUTVVGh+yJi13OSu9Wp7NDnGr7+2JIRvOl9PyZIpWTG7ScqEbZ/Mlvf7IUSz7TqmUhRFW1mvqum5yiT1kETCqiRaVmrA88N92l5JohNjTLuRyNaLy0RW7oX69nI69qfKSYLic7WRy/JSjNUrLBaVRLh7bS29Mssd67NeZ6PrbzOaloun3/cC8AP9/6aN23NhInbu/ZqNdAyVYfSZVa9nVobsSupBxXOrLwFVSXxtdvUbt/c9Nkuem5RUKpLT07JolTj01JUviH+2X1sLdHFPobNz1Ab7xftyzbPZnfBMBnTomv45kBUeiVuQJOfSvtbXJ+3t8rUWu17QwDcxMUeiAQCKwT1rxCt+zLX9r4rwD0Z38WgIMTm/Bqcx0eFo/5fed6Abg0JTb0SEric6fHjS8vZgd967Yf8scDrwTUBjoL77/82wDsSw401I5uvqS139uWhCixVCLHRG2RJRW/U6rMskVRahNe1eIiijLFXaHc1MBkRG1pH1PqqsILlfHkM5VqqCnRkEzn2NwhaeWv2Cw281tbpdSvNZReaqxccrs2UV7K+OlIRcao2c7f1Sv29ZempaBXfBGt2WxO3txxPwBDuTR356Rg1/ik5iFYaYhJXUHZ3zG5PvZPiJ/jXLaZppi2jNQmLFaW11o95sZrSm84SOuq+wN7vwXANcn67QwDcxMUeiAQCKwT1rxCN5vi3oSU1LyyXQoEnRltZeic2EKt2fGzpyXI/BMFKXh1cONmAN7eId742qiD5R7zttggbVpad2JI1W1NjPTGdolu2ZuQ41ouVWJNeq/QKJ6mxDUM1rzH7P2mpGub9C7W8tyqkRTWzCGnmZXTmoyYWXq2AmrmE0hV28tv3nqI13XIeC+LazOFyNL5RSxKqKScPURKxb9ElSYWUJ63FoumadLmGE3tWXZ0yRl7nq48d8QW7wuwVeuumOQ0/HL3Dzg0JhEpBzV6pxQDX+u3sCRWjdQ6fKynVALXso59jZ+kdhsuWWRTm6wAzf4elPnCWfM3dMPCCF/f+SAAZybbuD8jYVS5UbnwLPztyCm5IP9Nb0yvbpMwsV2sLBFXLKWbn9UbuJmJotrr80WbJPW7o9STc3lqP1ulRHO4tiYmS2YKX68MwBKTcgV6UvLjtbFTxjCUlBC3zIjcVGpT9nGUbgTW7ccKOLU1y4+Dze+eNnEQ3tL2FM/V6pftS+n8rCEera6LvtRs0EqXW9JalTCW470XfQdYHmeh/djtjI3zgZ23A/B/T/08ACe12WttQliJ0k0bfK7Ba0k/kmyeKp07K63RaAJfYDrB5BIIBALrhPNGodvS8LK4Fm5qOc3jaTGpDFv/RVW/ZoIxh0xtneqVoiOSZYOGW55IS2hlUU0u3Vpb/fImKVdgRaiWf0yyEtjXeYxDZ2Ulk5uq/l03p6iV5F2Kno4dkTw3tkm6+w1tUnzp/lEx6ZzMyNycHm8FyklEEedL/zcl/o6LxDm+LSbmh7iWWO0oFU4rEl8BhVcqblah0M2ktBQqycIJ375JHKBDxTRXqdlxY3T5Vh7d0Wau0Hr0f7BHCqi+L//vADjnZb/FORz8jWArrs0dI7yzRxLoLHghsHCCQg8EAoF1wnn3k5hStfiilqfY3yWJRMfjYuPLTFUrM1N1VvAIZi4itBz0RPK8tOspoBzSNqzlQ9++SZTmFQkpsLSY0LNGsFRsWwlsTIzQkpb5GbI0bHX0pWKinqyBx1KEe3ZHEry+RQptWWr8i5sk7K5PV1BWfrZYJ+yvVLQqWt0NqKxJlufcWq9b44GJS4EKG7ovNwsxp+hSrAO7o3I8psrTkZFlv0bK+5a5vSIhvo4/uuyrAHzksITgntJCaFPanGJGp2kFkYzOTZOWk1BfyId3/xO71SHbEgnO0MUSFHogEAisE847hW5qsTc2VmoSsGOLlREVpXf7gLRze9tG7ae5SinE7ZEEr2sRu/HNTRLN0qyRN5bo072M9tB6xLXM7t7kqVLbsVGNuGlSxW6t0qx1GqRZLJUhg7Xho9Ojj+qdr5VdXZkyt1631myk7Wclae3MQFvpvRFV6+PanzO+BJEvVm5hY7Q89yvdW9NWQZbo8z/2/AMAj0+J7+ozJ18IlEtvjI41UZis9re4cXmcHNAV4qOaAPafZJtZH59X67zA7ASFHggEAuuE806hG12RGL/Yeqjquaw2YnjxtlNVz6+WAkhHEqTV7rmQxrrLQTneeJBLOyT+12zmParM/+OWu+RxdBaj6DqlVpk/8WFpyGBtASdPS2lf/2qNma6Yor3N1WWbF8Na6nS/sVQwTB5vikpk1lW9XwLgXzKXAPA3z97I+ISskqcyuloZkePo/bKsonPdsuI4889SLqDnPaMsV+7FhcjauWoCgUAgsCjOW4VeL/JiZa3R5zc9Uc/7Nt8BQF+x2ka+V2P9L2TbptnMTZkbkV+SZhnFsxJZRdER00YamaIozfZFlLM9H7Drort0mOIn+tGmHRwb7wRgYEKvqe9KudypjfKZfJN8aKb2ioHFERR6IBAIrBPOW4UeWBzd0eaSwtrja+vYrkyZ4bWMKUhT6vZ4Z0pswV/ISSPjgVPtpeYkz01L0azochV4WaOYr+V9m77D6YIo88eyYiO/+9cuA+CJL+0FyvM4U3vFRqnNDzCWsinM+UhQ6IFAILBOCAo9sKgm0OuNksJTBfnO98jfrJevypD6Gz5fvLb0mVTc6slI3P5KNVJZK1Ta1HvVn3B98hgA+5oOAxB9z7eqPrMYJf3Q5NS0/ICSTX6Ryv98J9zQA4E61N4QTuUlpPMzw2JCmNC0d2Ke5oQs/9Oh/Ou0cMvnJ5fOCFAZUlobTmo3dvsBvlAJJpdAIBBYJwSFHgg0wIAW3vrBOem3OaUFzSKJArtbxVHarKV8L2SFvhLcc9u+aeGkIQxSCAo9EAgE1glBoQcCDXBlQoqDHR2ShCKvfTR91JOM5qveE1gezK/x6H/+K16+9RoA7ji5v+ZdF6Yz1AgKPRAIBNYJzvuVK8DknOsDxoH+FdtpY3QTxtQIa3FMsDbHFcbUGGFMjbHTe98z15tW9IYO4Jx70Ht/3YrudA7CmBpjLY4J1ua4wpgaI4xpaQkml0AgEFgnhBt6IBAIrBNW44b+8VXY51yEMTXGWhwTrM1xhTE1RhjTErLiNvRAIBAILA/B5BIIBALrhHBDDwQCgXXCit3QnXOvcM4dcM4945z7wErtt2YMFznnvuece8I597hz7jf1+S7n3Hecc0/r385VGFvUOfcj59w39PEu59x9Ol//4Jxb8RQ451yHc+6LzrmnnHNPOuduWO25cs79lp67HzvnbnPOpVZ6rpxzn3LOnXXO/bjiubrz4oT/oWN71Dm3b4XH9d/1/D3qnPuKc66j4rUP6rgOOOdevlJjqnjtd5xz3jnXrY9XZK5mGpNz7j06V4875/604vlln6clw3u/7P+AKPAssBvJzX0EuGIl9l0zji3APv1/K3AQuAL4U+AD+vwHgI+swth+G/h74Bv6+AvAm/T/fw382iqM6dPAu/T/CaBjNecK2Ab8BGiqmKO3r/RcAS8C9gE/rniu7rwArwK+CTjgeuC+FR7XzwIx/f9HKsZ1hX4Pk8Au/X5GV2JM+vxFwB3AEaB7Jedqhnl6CfBdIKmPN67kPC3Zsa3ITuAG4I6Kxx8EPrjqBw9fA14GHAC26HNbgAMrPI7twJ3AS4Fv6AXdX/FFrJq/FRpTu948Xc3zqzZXekM/BnQhdYi+Abx8NeYK6K25IdSdF+BjwJvrvW8lxlXz2s8Dn9P/V30H9eZ6w0qNCfgicDVwuA6NkIMAAAMOSURBVOKGvmJzVef8fQH4mTrvW7F5Wop/K2VysS+icVyfWzWcc73Ac4H7gE3e+1P60mlg0woP56PA+4CiPt4ADHnvrR7raszXLqAP+Fs1BX3COdfMKs6V9/4E8P8BR4FTwDDwEKs/VzDzvKyla/8diAKGVRyXc+7ngBPe+0dqXlrNuboUuFlNd//inHveGhjTvLkgnaLOuRbgS8B7vfcjla95+RlesVhO59xrgLPe+4dWap8NEkOWpf/Le/9cpAZPle9jFeaqE/g55MdmK9AMvGKl9t8oKz0vjeCc+xCQBz63yuNIA/8V+N3VHEcdYsjK73rgvwBfcM6dd92+V+qGfgKxmRnb9bkVxzkXR27mn/Pef1mfPuOc26KvbwHOruCQbgRudc4dBj6PmF3+Auhwzll549WYr+PAce/9ffr4i8gNfjXn6meAn3jv+7z3OeDLyPyt9lzBzPOy6te+c+7twGuAt+iPzWqO62LkB/kRvea3Aw875zav4phArvcve+F+ZLXcvcpjmjcrdUN/ALhEoxESwJuAr6/QvkvoL+4ngSe9939W8dLXgbfp/9+G2NZXBO/9B7332733vci83OW9fwvwPeAXVmNMOq7TwDHn3GX61E8DT7CKc4WYWq53zqX1XNqYVnWulJnm5evAL2kEx/XAcIVpZtlxzr0CMefd6r3P1Iz3Tc65pHNuF3AJcP9yj8d7/5j3fqP3vlev+eNIoMJpVneuvoo4RnHOXYoEAfSzSvO0YFbKWI94sA8iXuIPrYbDALgJWQo/CuzXf69CbNZ3Ak8jnu6uVRrfLZSjXHYjF84zwD+i3vcVHs81wIM6X18FOld7roDfB54Cfgx8Fok+WNG5Am5DbPg55Ib0zpnmBXFw/6Ve948B163wuJ5BbMB2vf91xfs/pOM6ALxypcZU8/phyk7RFZmrGeYpAfxvva4eBl66kvO0VP9C6n8gEAisEy5Ip2ggEAisR8INPRAIBNYJ4YYeCAQC64RwQw8EAoF1QrihBwKBwDoh3NADgUBgnRBu6IFAILBO+D+JHnJkn0oebAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "batch_x = np.zeros((batch_size,image_height, image_width, image_channel))\n", - "for n in range(batch_size):\n", - " batch_x[n] = (cv2.imread(directory+images[n], 0).astype(np.float32)/255.).reshape((60,180,1))\n", - "batch_len = np.asarray([max_stepsize for _ in [1]*batch_size], dtype=np.int64)\n", - "decoded = sess.run(model.dense_decoded, feed_dict = {model.X:batch_x,model.SEQ_LEN:batch_len})\n", - "plt.imshow(cv2.imread(directory+images[0], 0).astype(np.float32)/255.)\n", - "plt.title(''.join([decode_maps[i] for i in decoded[0]]))\n", - "plt.show()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.5.2" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/ocr/1.cnn-rnn-lstm/result.png b/ocr/1.cnn-rnn-lstm/result.png deleted file mode 100644 index a1bdeb9..0000000 Binary files a/ocr/1.cnn-rnn-lstm/result.png and /dev/null differ diff --git a/ocr/2.im2latex.ipynb b/ocr/2.im2latex.ipynb new file mode 100644 index 0000000..294ac3c --- /dev/null +++ b/ocr/2.im2latex.ipynb @@ -0,0 +1,1289 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget http://baidudeeplearning.bj.bcebos.com/image_contest_level_1.tar.gz\n", + "# !tar -zxf image_contest_level_1.tar.gz" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import os\n", + "import tensorflow as tf\n", + "import matplotlib.pyplot as plt\n", + "from skimage.transform import resize as imresize\n", + "import cv2\n", + "import time" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "100000" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "directory = 'image_contest_level_1/'\n", + "images = ['%d.png'%(d) for d in range(100000)]\n", + "with open(directory+'labels.txt','r') as fopen:\n", + " labels = [i.split()[0] for i in list(filter(None,fopen.read().split('\\n')))]\n", + "len(images)\n", + "len(labels)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "charset = list(set(''.join(labels)))\n", + "num_classes = len(charset) + 2\n", + "encode_maps = {}\n", + "decode_maps = {}\n", + "for i, char in enumerate(charset, 3):\n", + " encode_maps[char] = i\n", + " decode_maps[i] = char\n", + " \n", + "SPACE_INDEX = 0\n", + "SPACE_TOKEN = ''\n", + "encode_maps[SPACE_TOKEN] = SPACE_INDEX\n", + "decode_maps[SPACE_INDEX] = SPACE_TOKEN\n", + "\n", + "GO_INDEX = 1\n", + "GO_TOKEN = ''\n", + "encode_maps[GO_TOKEN] = GO_INDEX\n", + "decode_maps[GO_INDEX] = GO_TOKEN\n", + "\n", + "EOS_INDEX = 2\n", + "EOS_TOKEN = ''\n", + "encode_maps[EOS_TOKEN] = EOS_INDEX\n", + "decode_maps[EOS_INDEX] = EOS_TOKEN" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "GO = 1\n", + "PAD = 0\n", + "EOS = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "image_height = 60\n", + "image_width = 240\n", + "image_channel = 1\n", + "max_stepsize = 128\n", + "num_hidden = 256\n", + "epoch = 20\n", + "batch_size = 128\n", + "initial_learning_rate = 1e-3" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 100000/100000 [02:42<00:00, 614.11it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "X, Y = [], []\n", + "for i in tqdm(range(len(images))):\n", + " img = images[i]\n", + " X.append(imresize(cv2.imread(directory+img, 0).astype(np.float32)/255., (image_height,image_width)))\n", + " Y.append([encode_maps[c] for c in labels[i]] + [2])" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "train_X, test_X, train_Y, test_Y = train_test_split(X, Y, test_size = 0.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['5', '-', '5', '*', '1', '']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[decode_maps[c] for c in Y[-1]]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAB2CAYAAADRN8iWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO2da3CU15nnf6db3bq0Wt2tGxISQlzEXYANJgaMb0nsGF8AJ7YZx05qJhPPVM1UzdTO1k5m58t82K3a/bAzO1u1mxpnZmJ7M14ncZw4ZuxccHA5wRiwAwEcIRACGYHQvaXuFuqL+t0Pr87D+woJC9Ct4fyqKFp9e0+ffvt5n/M/z0VZloXBYDAYcg/PbA/AYDAYDDeGMeAGg8GQoxgDbjAYDDmKMeAGg8GQoxgDbjAYDDmKMeAGg8GQo9yUAVdKfUkp1ayUalFKfWuqBmUwGAyGz0bdaBy4UsoLnAK+CLQDh4E/sCzr91M3PIPBYDBMxM144JuAFsuyWi3LSgGvATumZlgGg8Fg+CzybuK1NcB5x9/twOeu9YLy8nKrvr7+Jg5pMEw9qVSK1tZWAObNm0ckEpnlERkMbj7++OMey7Iqxt5/MwZ8UiilXgBeAKirq+Ojjz6a7kMaDBOSzWaJxWJ0dHQAsH//fvbs2cOSJUsAqKysZOvWrQDs2LGDYDCIx2P2+g2zi1Kqbbz7b+bMvAAscPxdO3qfC8uyXrQsa6NlWRsrKq66gBgMBoPhBrkZD/ww0KCUWoRtuHcDz07JqAyGaebQoUOu/zWbNm1i06ZNszEkg+G6uWEDbllWRin158DPAS/wr5ZlfTJlIzMYphFtpAsKCjhx4gQPPPAAAIsXL6asrAyA4uLiWRufwTAZbkoDtyzrbeDtKRqLwWAwGK6Dad/ENBjmEh6Ph1AohNfrBexNyy984QvyeHFxMX6/f7aGZ5hm9CY2wMjIiNzv9XpzcsPaGPAcIh6PMzw87LovGAwCkJ+fPxtDylmMPHL7kUqliEajtLe3A/DBBx/IY1u2bKG2tpZwOAyQMxdxY8DnKMlkEkC8BYBLly5x8uRJmpqaAFi5cqXotcFgkJKSEkKhEADhcNgYdYMB2/EB6O3t5dSpU7z66qsARKNRcYD6+/t57rnnxIDnCrm1XjAYDAaDYDzwOYr2wD/99FPeftveJ87Ly6Orq4vDhw8D8Oqrr1JQUABAKBSiurqaRx99FICqqip5zOPxoJQS2SAUChEKhcT78Hg8V3n8mUwGsLXBwsJCGZfP58Pn88nrDIZc4ezZs+zZs4dEIiH3LVq0CIBHH32UQCAwW0O7YYwBn6OkUinAXtr19PQAcPDgQS5evCjGtr+/X57n8/koKSnhww8/BGwJRRtwr9eLz+eTFPHGxka2bt1KSUkJ4NaDOzo62L9/P/PmzQNsY683e6qqqli4cCFVVVUA8nqDIRdYtGgRO3fuZN++fQCsWbOGtWvXAlBdXZ2Tm5i5NVqDwWAwCMYDn6NoT8Dj8TA4OAjYmzFDQ0PiEQcCAfLy7K8wnU4TjUbp6+sDoK2tDaWUvIfTAz99+jS/+tWvREIJBALiVVdWVnLgwAHZqU+lUuJpl5aW8uUvf5n7779f3lcvO/WxDIa5hl5h+v1+gsEg69atk8d0tEmuRiUZAz5H0QYxLy9PjLS+3/m3vm1ZFplMRoy7Uuoqo3r58mXAjmbp7OyU+4uKilwXgmw2K+GKQ0NDfPrppwBEIhEKCwvRNeQfe+wxkWmcY7oZksmkK/IGkGPk6o/MMDfw+/2UlpbO9jCmlJw34M7A/IGBATE8Ho/HFVaXayF1eqMwEomwYIFdM6ylpYWBgQExxF6v16XZOY1oNpuV2+PpepZliSEeGhpyvU5veoLtnev3Gh4epqmpicrKSgBWrFghxywrK7uu2Fm9STo2rn1oaIi2Nrvw2jvvvMOyZctYs2YNADU1NfJ95ppWaTBMB+ZXYDAYDDnKLeGBR6NRAN5++20+/vhjeeypp56ioaEBsLXdXFqCa9mgurqae+65B7B17fPnz4smrpRyedrae9ZcS5ceT2IBJMVcv69lWa73jEajEsaolOKP//iPAdtTvxEPvL+/n+bmZvm86XSa5uZmwM6U++CDD6Q+93PPPecKfTQYbndy2oBblsXw8DC9vb2AbeA++cQuiNjd3U1zczPPPPMMAPfff39OLcG1NFFcXOwK98vLy3PFaGuUUiK7gK1ra4Oq30u/bmhoiEQi4ZoDLTGNvcglEgm5QHq9Xrxe77iGP5FIyPtrnJtHY3GOpbe3l3PnzgHQ3NzM739vt1Vta2ujvLxcvt++vj7mz5/v+kwGw+3M3LZiBoPBYJiQnHRj0uk0YHt93d3dnDlzBoDOzk7Z0IzH47S1tfH9738fsL23r371q+IVznUPfDz0xqOWNzKZjHiygUCAuro67r33XsCWOvQ8VVVVsXr1aqmh0tLSwuDgIAMDA4A9N/o9y8vL2b59O11dXQC0t7fL82KxGKFQiDvvvBOAhx56SFY0XV1dNDc3c+LECQAeeOABFi9eDNgbnGM9ex0tE4/H6ejo4OTJkwD8/ve/l+9zcHCQ0tLSOf1d6TmOxWJy7iUSCVKplEuG0iucsTVr5vJnM8x9ctKAa6N18eJFDhw4wJ49ewA4d+6cK002FotJPPPp06c5ffq0/JAqKipyIjLF4/FIgZ3S0lIikYho4JcvXxYZJRAIUF1dzbJlywA7y0y3sNOyx+bNmwE78iMWi4lhfu+99yRUsKqqio0bN1JdXS1j0McbHh6mp6dH9OpwOCxGa8+ePRw7dkxknIGBAXbt2gUgBbec6HH7/X4uXrwokSddXV2SaaqjbPRzxzN22kjGYrFxy4NO9LqpQo/14sWLvPPOO4B9gXSGQvb29sq5dscdd/D4449TU1MD2N9pLpyHhrlJzhhwHfKWSCTEOzx16hTHjx8X/bS7u1tSy8H+EesQuaNHj+L1evmTP/kTwPaEcuGH4/V6JXb1S1/6Eh6Phx/84AeA/Xk1JSUlRCIR8XqXLFlCeXk5MLHh0wZ85cqVYojGC7/Ucz88PMzIyIjM8cWLF3nzzTcBu84EIIk9jz32mNSZGA+ti1dWVvLMM8/IMQYHB2UsyWSSbDYrGviZM2fkAlFWVkZ+fr5czNvb28ctDwr2hWY6yoPG43EuXLDbwDY1NXHgwAHAPi+7urrEmchms7KP0drayvHjx/nmN78JwPLlyyUsM5c22Q1zA7N+MxgMhhwlZzxwvTzu7+9n//79AOzbt4/f/va3Uu93rD7sjNiIRqOcOXPGFWaoPbSSkpI5W8jG4/FICnxFRQWVlZUijUSjUfm8Pp/P5S1ns1l5bLzP5Xxf/f9E6KgTXZVQe+CVlZU8/vjjwJXi+Fu2bAH4zOL4+r6KigoKCgokWqi8vJzXX38dsDNGk8kkv/vd7wDby16+fDkA27Zto76+XlYRr776qkTLBINBqe8MTFuN53g8ztGjRwF47bXXRP/v7e3l8uXLMv9KKVkJdnd3MzIywre//W0Ann32WSlNYDxww/XymQZcKbUAeAWYB1jAi5Zl/aNSqhT4PlAPnAOetiyrf7oGqn8M0WhUfihHjhyhvb3dVTfEeRuuhJt5PB56enp46aWXANvg6A2/Rx55hEAgMCcNOFy5eOlUeefmmCaRSNDW1iYblaWlpRQVFQFTXzVQG9/y8nKRoerr64Er2vZkL4i6xZmWicrKykQz1zKENn59fX3y/g0NDQwNDXHkyBGAq0qEzkR50Hg8LiUJLl68KBcQy7JcJXidIY8+n++qDFqD4UaZzFmUAf7KsqxVwN3AnymlVgHfAt61LKsBeHf0b4PBYDDMEJ/pgVuW1QF0jN6OKaWagBpgB3D/6NNeBt4D/no6BplMJmXDrqOjg0uXLgF25IFSyuV1O5sWjIyMSJiXThjRHlMikZAoheXLl+P3+8Xzm0v98HSRKrDlo76+PlfCjPbOE4kE7e3tHDx4ELAzOHUkyXTV7dbe83TjlIWcq4758+fLd/iTn/yEO+64A7DrnWcyGanMeP78eXlNKBSasho5Y+vNOBOrxq4G9bgLCwupq6tj9+7dAKxfv95IJ4Yb5ro0cKVUPXAHcBCYN2rcAS5hSyzTQjKZlN3+H/7whxIznMlkXEtRr9crP4bFixezbds2iSk+deoU3d3dopdnMhmRYt544w2+/vWvT5g5GI/HXdEtmpnoYK6zTeFKBqXWfZ0GJJ1OMzg4yMWLFwG4cOGCzFkmkyEajcpn0GVgx+ufaVmWq3myc44DgQCFhYUuQzUZkskkg4ODUoTLsiyZt2AwSCAQkLkvLS0VOaW0tJS+vj6JShkZGZFxKaUoKCiQC1gkEpGL17lz52hsbJS9kp6eHtHx58+fzzPPPMOKFSsAW4O/UQMaDAZd0o+WUPr6+kin0xIx4yxbUFRUREVFhTgL5eXlxoAbbphJG3ClVDHwI+AvLcsadKZTW5ZlKaWsCV73AvACQF1d3XUPMB6P097eLtruyZMnxUglk0mXMfH7/RI6t3z5clavXi06d09PD4cOHWLv3r2AHfamvfEzZ87Q0tIinqpSSvTjWCxGR0cHx44dA+DEiRM88MADgH2RGC9JZSpRSslnTKfTJJNJ0VTHeqTOEL9EIiEXul//+tecPn1a5m14eJjy8nJpv1ZdXS0GfGRkhEQiwalTp+TvlStXAra3mJ+fP6EBtyxLjj8wMOBaHbS1tfHrX/9anqs183Xr1lFWViarivr6enbu3AnYcx+Px8WAW5YlWndTU5Or3dvZs2dpaWkB4NChQ7zxxhsyzkQiIQa0oqKCwcFBnn32WQDuvffemzLgd911F2Cfe9/97ncBe5N3bElc5yrCYJgqJrWTopTyYRvvf7Ms643RuzuVUtWjj1cDXeO91rKsFy3L2mhZ1kYdPWEwGAyGm2cyUSgK+BegybKsv3c89FPg68B/G/3/zWkZIbaG+e677wJ29qFe0o8tquT3+yUkrqamhoqKCqmlHQ6H6e7udoW26SV9R0cHb775pix5CwsLJfKhvb2dN998Uzxwn88nEsauXbvGzTKcavS4qqqqpHPOeDj18o6ODpGLzpw5w7Fjx2TFkU6nCQQCEgIXiUTEA08mkwwPD8sKZP78+fJ50+k0dXV1ssopKSlxdeRxNoJobW2V0MLu7m4KCgo4dOgQYFda1MlYbW1tJBIJNm7cCODKANXJVtqbdcpJ0WiUpqYmSfI5f/689A4dGBiQMFI9f9obj8Vi9Pb2yp5KLBaTRJrrpbi4WDIqU6mUhFBms1laWlpk/g3Xj67zPza7FiYf4XQ7MBkJZSvwPHBcKXV09L7/jG24f6CU+gbQBjw9HQMcGhqipKSEz33uc4CdEj92GeoM09IbnB9++CHBYNClu2rjBLa2q2/39PTQ0tIiIWl5eXliwN566y1X95pAIMBjjz0GcM1Mw6lCKSXGtaioiKKiIjGaHo9HTnC9aaaN7b59++SxVCrF5cuX5e90Ok1/f78YmPb2drkYWpZFXl6eHLO9vV0uXj/72c9oaGgQ2WDbtm1SHTAcDuP1euUYfX19tLa2AnDgwAHa2tpEqqioqHB1B+rr65OSB0uWLJGLsG4fp8emlBKJRu9n6I3KS5cuuQy9M4zP6/XKOXP58mWGhoZkc9spQd0M+fn50iC3qamJM2fOuBwNfRydqzBVx73V0N9vNBodN7sWruQYzKVgg9liMlEovwEmKiz9+akdjsFgMBgmy5zPxPR6vfT29koGZTqdvko60d7V4OCgeNWDg4OcPHlSNk4bGxtpaGhgyZIlgL05p5fxqVSKoaEh/v3f/x2Ad999V6SRrVu3kpeXJ9mPDz/88LTX2JgIHTLpbFbs9OTS6bR4pPF4XLxM7VHrsebn57uaPzjfZ2RkBI/HI68dHh4WryiTyXD69Gmp193S0sLXvvY1wJadnFE5paWl8r309fXR29srRbH6+vqkdorP58Pj8cjmZCaTkbnv7OxkeHjY9X3r7/fkyZOutnDJZNLl8Xq9Xldkjd4I1d7vVHnAesXR29vLW2+9BdiSlXPpD1dqxEQiEcLhsESv5EI9npkiHo+LJHbq1Klxs2vBbuwxXdm1ucacN+Aa/YNLpVKi8+ofo7N6nTY88XicRCIhkReWZVFfX89TTz0F2IbCKQ1cuHBBNNRsNisnSzwe5+mnn5Z055qamlntCqM7zMPVewA6BBDsz+d83BmXPLYpg9Ogjf1MzuV/Op1mYGBA4qqbmpo4ffo0YOvhtbW1MjafzyfGvKCgQCQpsL9DfVHw+XwUFRVJyKP+vsC+8Ogmy3ps2oBrKUKfC07ZS4/b2V1I3/Z6veTl5U3Zd6fPvd7eXrl49vX1kUgkXNE6Wj5at24dTz75pGjnxoC70Rf2PXv2jJtdC0x7hm0uMecNeFFREWvXrhXD4Axx6+zsZHBwUB7LZrNimLxeL0VFRWIompubee+996QaodMbX7VqFS+99JIYo+7ubrnya89b/9BuNHElHo+LRhuLxYhGo2J8/H6/hDCGQiFXXPZn4dRWndqvM9lHG6yJWqxN1F7N+XqwjVAmk5GL5JkzZ/jRj34kz8vPz5f5cWrXegXgNJrOi4lzvMPDw7KB7PF4KCwslPK1SikZi/68zi5Ek8GyLEZGRqbMA3d+Rn0B8fv9rgQzfXEDW+NfvHixnFems5Abva+0c+dO9u3bJw2t165dKxvcZhPzCmYWDAaDIUeZ85f/wsJC5s2bJ1fcb37zm/z4xz8G7OiGvLw8VxMB7XFrD0hrkYODg5w7d06895KSEgkfa2xs5A//8A/53ve+B8D7778vy3Zd1W+yHpuWMHSHFq37xuNxV7PeaDQqkRiFhYXSfPmpp55ixYoVE3rgzoqL6XRaPp/X68Xn87mqEzpfY1nWhI0RnDKFfo4z8sMpPzjnNBaLyarl0KFDRCIR6dYz1qt3es9OOWes568/h/N143GtVcNYjTubzcqYp1oDd674nPOrpRqw90p0Qtl9991HOBw2HuQ4OPdQgsEg69atk8f8fr/JWB2HOW/AwV6a6/jndDrN3XffDcDSpUsJBoMSX9ze3i6bIDqG1KmZdnR08OKLLwK2sX/wwQcBWyZZuHChpFd/+umn0iRi7BJXx6cC0pZML/kBud3c3MwHH3zgavGmjbmWULR+mp+fL+Neu3YtJSUlYgycJ61e/msJY7wLy3iGwbIs0um0zEUkEqG2tlb+TqfTImFkMhmy2axcCJ2bmNrwOkMOtdR07tw5Tp48KVJQb2+vzIUe93gZnM4LBdg/1LE1Ra5lqJ2VGZ3hpTouXb+Hlpby8/OlsxHg0s2vl3g8Lnsluk4N2HPm/F6c1RZLS0tnpH5MruLcBDd8NsYNMBgMhhxlTnng2tMbm8Hm9/vFUyotLZUmAslkkkQiIREig4ODkmHX1tbG3r17xZPu7e2lt7dXdrZff/118Z527dpFMBjk4YcfBuyMR123o6uri8uXL8vueFdXl4zvww8/5OTJkyKFwJW61IODgy4JxVnHe2wiklJKvOF9+/ZRXV0tGaRjcXrHztC58TbD9HH0pp3evQ+FQlRUVHDffffJc7W8c/jwYWprayUqpKuryxWqZ1mWK4xRH+PYsWN0dXWJpLJt2zaJ1gkGg/h8Ppk3v98vntZYr9wpr+i5uZYHrqURPTZ9v7OeTXFxsXjc9fX1PP300zQ2NspjN8rQ0BDHjx8H7GqIOhkplUrh9/tdXrcOjTQRFIapZM4Y8Hg87kqv1nHfjY2NrFu3zlUmdiKy2aws2xsbG1m1ahU//OEPAfjNb34jxaDArtZ3+PBhwF5Gb9++XWSa4eFhyWjMz8/nww8/5O233wZs46PfIx6PXyWhaMZm3KXTaTFWRUVFFBcXSyxrWVmZxKvv2LFDomPGEggEKC0tlVT2np4el0zhZOwFI5PJiBHbtGkTn//85yW64/3335cLne4C/5WvfAWwjdTPfvYzwJaohoaGXNKEngs9Dh1imEql2LZtm3ze7u5ueY4zQga4KqzPKatcKzpmbAcmbcx11Iue73A4zKZNmwB44oknWL16tcyhnoMbwZkJ2t/fL3KSLlWgI0927tzJ6tWrAVwZogbDzTKnDLguB/raa6+JdjwyMkJpaamc+NcKIXLWA9dlT/UPNxaL8dFHH0msd19fn3iLkUiEjRs3ikFdvny5eG//9E//RGtrKx0dduXcgYGBCeOws9msy+srKiqScesUdbA19+3bt0sscDgclgtTSUkJJSUl4yYIBYNBNmzYIMb6u9/9rqwM9CrAabS1p55OpwmHw3KRWLp0KaFQSFYZly5dkrGVl5fz7LPPSshWKpWS0K5//ud/prW11VWeYLxNSD2n2uusqKhg3rx5shpxxqjr+i3a+FqWJd9vfn7+pIy4/rx6b8Dr9RIIBOSCtXbtWjZs2ADYHnhlZeW0x18XFBSIBx6JRGQsZiPOMJUYDdxgMBhylDnjgQMuuUFHZfzqV79CKSUhf5PtXenxeAiHwxKK1NnZySeffCLeXDqdFk9S9110JqzoJfauXbsYHh4WD9zZLDgYDFJSUiJeVSAQEN13xYoVbNmyxaV56uW67gijPf7JeoPFxcWunpGVlZWSuai1d+cc6tuBQIDKykopPFVRUUF1dbXsJeiVD9gFg5wNiWOxmOwVLF68mEuXLokn7azyp+UUvQfR09Mjq5H8/HzKysrkdc7VQSwWc9URd4Y0Xit8c2RkxCUTZTIZOV5FRQUrV64UGcjv9zNvnt1vJBKJ3FTyjD6ejiTSc+NsMB0IBAiHwzKHpaWlJuPSMC3MGQPu8Xhkqb59+3Zefvll4MrGpNYaq6qqJvUDVEpRWFgoBjQQCBAIBFzhavoH19PTQ09Pj1w0CgsLXWGL69evF2Pf1dXlaha8YsUKCWt0GnPdums60u71uFOplEt6cKaWFxQUyEVo6dKlrFy5knvuuQewM09LS0vltU7N3ev1umQqv98vhkgbJW2I0+m0PE+HKupqkN///velMUNjYyP19fVSK2RoaEj0cB1uqL/f3t5e1ya2c8N37Ians6KkZVky15s3b+b5559n8eLFgG1AndmcN/NdOOvuvP/++7z//vvA1QZ8xYoVsklcWFgon0kpZWQUw5RhJBSDwWDIUeaMB+71emWD8dChQ65KenDF84lGo65MPe0xwuS8XP1+zpoenZ2d7NmzR96noqJCvOxwOMwTTzwhbdScESfFxcXiacPM1GhwZkaC+zOn02n5TPPmzWPz5s2AHXmxatUqV/LKZJf0Tvnh3nvvJRqNilyjvWY9Lmc1xL6+PvGOi4uLaWhoYPny5QCu/pjRaJT9+/dL+7fz589LNFJXVxfZbNa1aez8vHl5eeJZZ7NZ2QiurKyksrJS2rZNpXzhPA/7+vpEQhkcHHQlFSWTSYnIOXTokGSo6poes1kQzXDrMGcMOFzRbzOZjBiJZDJJR0eHRIx0dXVJf0y/3y+aLVy7vKvX68Xv97tihXUIXF9fn6uPYTweF53ZGUEwF3DGF9fW1oo2r8vHOtO3dZTLkiVLWLBgwXUXftLH01LM+vXr6ezsFON74sQJMdj6wuIsUatv60a+TvSFZmhoiBUrVrgacfz0pz8FrmS9OtP89W19PKckpj/7wMAAZ8+eFcNdVlYmBnOq5Yvxel0ODQ1x5MgRiRGvra0VQz88PMyOHTumdAyG25c5Y8A9Ho8kr9x3332SWBKNRvnd734nSRJKKdG1S0tL6e/v57nnngO4Zo1g3fDXGWOsjX0oFMLv90/bj3wq8fv9Ygy3bNniCs3LZDISKrhkyRLxeG9mZeDsCFReXs7dd98tF9fz589LvLyuBOkMAdQGdbwwQH0x0XOuVzZLliyRuiG6y44zIWns2JyrEe3xHz16lLa2NlatWgXYeyq6i9DNfLepVMq1wujr6xu3bVoikSCTybjORx3Oqv83GKYCs34zGAyGHGXOeODFxcXiTfX394ue2dfXR3d3t3iaJSUl4tnpIu+TSU8eW51wZGREPPDKykoef/xxVq5cCcztIvt+v1/knfr6eunDuHLlStauXStyRygUcqWy34zWqj3o4uJiCZ3UfztXNGPT4CdDNptlcHCQ/fv3A3bGrJbIhoaGUEq5aoVrz11X/HOOUa8MdGcXrY9funRJHhsr5VwPyWRSSgz85Cc/4fjx467wTS2nFBUVUV5eLsXRnn/+eVkNlZWVzekVniG3mLQBV0p5gY+AC5ZlPaaUWgS8BpQBHwPPW5aVupnB6B/XXXfdJT9iHUaof8SBQIAvfvGLADz44IMsWbJkwg2hsdXqnLU7ksmkGP6KigrKyspEW57LPzBnZb2lS5dKqJwu16ovfFO5OebshjQ0NCR7Bbr7u36OzqoEOzTTGRPurMLnHFs2myUajfLpp58CdtPqtrY2AKm86Iz11jKJTpfXRtqp7+vysXosznjxzyKZTIo+r9HHSCaTV0ko2oB7PB45n8rLy7nnnnvYvXs3YGvgeh/FNOI1TCXX8yv/C6DJ8fd/B/7BsqylQD/wjakcmMFgMBiuzaQ8cKVULfAo8F+B/6DsdfODwLOjT3kZ+Dvg2zc6EJ/PJx7w8uXL2bVrF2B7dgcPHpTNskQiITv6PT09hMNh8Zh8Pp94XbFYjIGBAVpbWwG7/delS5dEQikuLpbjae97LksnGufmq5ZLphtn8kpzc7Osjnp6emS+nRmu+rGf//zn8lhlZeWEK6VsNisRQcPDwy5veaws47w9UZamrouiVwrOSoWfhTPrtqenh7q6Ogkp1c2ZwU44cjbY9nq9smm5c+dOHnzwQVkdhUIhEy5omBYmK6H8T+A/AcHRv8uAqGVZuqxcO1Bzs4PRhqmsrEx64Z04cYL29nYxvMPDw5LRd/DgQRobG6UxQyQSkdTyjz/+mN7eXlcZWKdsUlZWJtEOu3fvZsGCBTlhwGcDZ3f55uZmiRDq7++Xx/R3p7+nRCIhskhLSwutra0iRZSVlcnztcw1nkwytreks7lDfn6+PK5fN9a4TyYlfyyxWIyWlhbAbtqcSCRE8z99+jTvvPMOYFez1BcdPTa9N6HLHcyl8NBIJXYAAA7pSURBVFPDrclnGnCl1GNAl2VZHyul7r/eAyilXgBeACTE7RrPBa7UzgD44he/iGVZ/OAHPwDshsPaA+/t7eXixYucOXMGsDVa/aNKpVK0t7dLjQ2tgesqezt37hQtvaGhwXhJ10Abv+HhYfr7+8ULHRoaclVbBFy1UbQx/+STT3jllVck3HPZsmXy/WrtWj9XdwTS7+lsxVZYWCjesH6dXgE4vWx9Hml9OhqNypgrKiquuambSqUkkejgwYPs27dPaoen02kx7j09PYyMjMhFPxwOi8GORCJzeh/FcOswGYu1FXhCKXUOe9PyQeAfgbBSSl8AaoEL473YsqwXLcvaaFnWxpuJADAYDAaDm8/0wC3L+hvgbwBGPfD/aFnWV5VSPwS+gm3Uvw68OVWD8ng84s0sXLiQhoYGKbhkWZYrq62zs1Nkk7HRDbqetPN9tWfkbIyQy0tdnUgSjUaJx+Pyt7MyYigUcoX8XS/ODFnd3WcinJmRWg+PxWJ0dHRIhuWXv/xl0YtTqZSrX2g8HpdVU15enqtwmdfrle/sjjvuYGRkhI8++giwPXDtjesQQ/2eBw4ckLmoqqq6qqJlNpuVPZbu7m5JgT958iTJZFJWeGNfA1ciVOrq6njyyScBO13+ZnptGgyT5WbiwP8aeE0p9V+AI8C/TM2Q3BQXF7Np0yb5wb/00kuSVt/T00MikRCjkUqlxLj4fD6XfupMoYfr00VzgTNnzvDWW29JSnplZSUbN24E4IEHHqCoqGjcpsKfhbNErO46ow1lUVHRVd3l9THy8vJcF8uamhqeeOIJwI7f18a9v7+fw4cPu8r1jn1P53vri61ezWmDruOz4UqlSZ3defr0aSlHvHr1arxer7w+Pz9fQhkBfvnLX7Jv3z7Avpg4a9+M1eMDgYBUrVy8eLFsWlZVVRkJxTAjXJcBtyzrPeC90dutgMkLNhgMhllizmRiTkRxcTF+v182y77xjW/wve99D7B7OWYyGVlmFxQUuBruKqVcHrdSylU/+1ZAf55wOMyKFSukOUNbW5t8xqqqKkZGRmQVU1BQMOmmBslkUiSqN954g+bmZpE4nMWknHMNtgeuveN7772Xr3zlK7KJHYlExLPt6+sjGo2KB+xszAD2xqWOWCkoKBBPuqKigmw2Kx55OBx2ba4WFRWJzBGPxzl69Chgnxd/+qd/KpEl+fn5jIyMyPF1n1OwV3g+n8+V/TkWffyVK1eKfNXd3W3qfhtmhDlvwMFdEW/x4sWsX78esJffp0+flh/O5cuXZXmvK9c5Y4qdMdTOYku3AsPDw/T09EgIXk9PjxiltrY2nnjiCZm3uro6QqHQNSUVPW+JRILOzk7ADsW8cOGCZCM6Gwfr1zi7FWkJZ9u2bdTV1cl36PV65TsbGhqira3N1a1HG8pMJoPP55OuSlu3bpXIkmXLllFZWUlDQwNgf/e6JO3g4CCpVEoM79DQkBj3jo4Ozp49K4bX4/G4+lfW19ezbNkyOX4qlZLqi2MzNAHRx1955RWWLl0K2OV7t2zZYgy4YdrJGQumDUVpaal0etmwYQNHjhzh9ddfB+yu6foHrutD6x9RcXExFRUVYsQaGhpkYyuX0ckqTU1N7N2719ViTW/2ai9XG80vfOELBINBl/HVnvzly5dJJBIu46/Lora3t4sxg6tLFeTl5bk2n7UhrKuro6SkxOXBOqsRFhUVidft9/vlOwT7e9M685o1ayQ/ID8/n0wmI4Z3x44dEtJ37NgxBgcH5fP5fD5ZwWUyGfbv3y/vOX/+fGm/B3blQn3haW1tZe/evZLa76w+GI/HSSQS4q13dHTInC1evJiamhoZjwlRNUwX5qwyGAyGHCVnPHDtwTgbLJSVlREIBGR53NraKsvcoqIiYrEYW7duBZBKfdrrLikpmVQVw7mOlkxisRiJREI8ZGd1vP7+flpbWzlx4gRg68V9fX3ioQYCAfEW0+k0ra2tHDlyBIAjR45I1M/g4KBIDmNJJpMopUQm2b17t3iytbW15OXlucIY9fFqamp48sknRRPXES9wZR9DH6+0tFQ8bq1da8/6wQcflI5O8XicCxcuyFw4ZbSCggK2bt0qDZ71cXShrVAoJN55fX09GzdulOSd3/zmN/zyl78EbFlmZGTE5Vlr73zfvn3k5eWJXj8TnZoMtyc5Y8DHo6CggIULF/JHf/RHgG1gdDhad3c3a9asEYMSDodvSU1Sf6by8nIikYhkOGodGGz5aXh4mHfffReA48ePi3HRj+uG0o2NjbS2tvLb3/4WsKUZrYF7vV58Pp9r70BLL36/n1AoJN2RFi1aJA06xouz13sR2ijr5xQUFMg+hu6yo+WWsRcBr9crF+Hq6moef/xxwDaYP/7xj6UOzsjIiIw5EAhIqrtz/sYbW3l5OeXl5fKcVColslRTUxO9vb2i3esu9WCfe11dXbJXMH/+/Ftqv8UwdzBugcFgMOQoOe0W6OgU7WUnk0lJptDo5fet6gFp73D9+vWk02m+853vAPZmpPaOPR6Pqx1YNBqlpaVFvNlAICARHHv37iUvL0+8yVgsJh6prgzoTIpyRp2sXLmSe+65B7A94uvpwelM3tHvmU6nr2rcMNHrIpGInAfl5eWUlZXJyiEvL08iRDZu3Mi8efOuq3CZ3mDdsGGDvM/Fixf5+c9/zqFDhwB7E1PLd5FIhNLSUtkYNfKJYbq4paxafn7+lFUU1EYkFou5ejF6vd451VHcKaFUVFRIB3lnRM7Yhs7gHvvly5dFS7cs66p0eW1Aday3s8qfljvy8/Opqalhw4YNgB17rvXpa6HLG+g5dcbyj4yMoJSSC4Gz+uB4aEM7b948Ghoa5PibN2+WcS1YsIBQKHRd54kzkknPb3V1NfPnz5cuTr/4xS/EgD/00ENs3rxZ9PobyYA1GCbDLWXAp4pUKiUeaHt7Ox988IE8tmXLFtF5w+HwnOmwkpeXR1FRkasxs9Mo68p+4C7LOhZn9b+x6DhvZxcebTSrq6upr6+Xao/OFnnXQqek61jvCxcuSCikTsjROvdnXTC1wV63bp14ymAbdmcbuJtBj0HH0T/88MOAfV44j3er7rkY5haz70IaDAaD4YYwHrgDHQbW29vLqVOnAHj11VeJRqPi2fb390tda61xzgV8Ph/V1dXS3CIYDErY4NmzZ12auA5/G1ssCq5462O74oAtBThT20tKSiRZ55FHHuGuu+66bt1XKUUwGJQwvtraWvHGBwYGWLFiBQsXLgTs+b7W+zqljpmguLhYjqVXHgbDTGIM+DicPXuWPXv2AFeaAugwu0cffXROxo8XFhYyf/58HnroIcBuS3fgwAEA3n77bY4cOSIXKG2ItcTh1Lx1F3gdguiMdQ6Hw9TW1rJ69WrALul65513AnY8d1lZ2aR0bydKKVeVwUWLFsm47rjjDlcK/tgysAbD7Y75NRgMBkOOYjzwcVi0aJHUW9m3bx9r1qxh7dq1gL1UnktRKBqv1+ta0vt8PqmT0tbWRk9Pj2wOajnFGV2jozL0ezjDL7VHHIlEWLRokdQjaWxslM3CYDB4Q6GaSiny8/Ml/FNvEOtj62qUBoPhaowBd6CNn9/vFyOt9VhtRHIlsqCoqEikjnA4zKpVqyRmua2tzVVZL5vNivRRU1PDQw89JA0PfD6fXKh8Pp+r008wGBRDfzMXM5/PN6f2EwyGXMEY8HHw+/0Sw5urOGtnh0IhampqJMnGWTNFow14KBSipKRENhWnKq7eYDBMPXNHAzAYDAbDdWE88FsYLWsUFhZSWFgoVfYMBsOtgfHADQaDIUcxBtxgMBhyFGPADQaDIUdRM9mdXSnVDSSAnhk7aG5QjpmT8TDzMj5mXsbnVp6XhZZlVYy9c0YNOIBS6iPLsjbO6EHnOGZOxsfMy/iYeRmf23FejIRiMBgMOYox4AaDwZCjzIYBf3EWjjnXMXMyPmZexsfMy/jcdvMy4xq4wWAwGKYGI6EYDAZDjjJjBlwp9SWlVLNSqkUp9a2ZOu5cRCl1Til1XCl1VCn10eh9pUqpXyqlTo/+H5ntcU43Sql/VUp1KaVOOO4bdx6Uzf8aPX+OKaXunL2RTx8TzMnfKaUujJ4vR5VS2x2P/c3onDQrpR6enVFPP0qpBUqpfUqp3yulPlFK/cXo/bf1+TIjBlwp5QX+N/AIsAr4A6XUqpk49hzmAcuy1jvCnr4FvGtZVgPw7ujftzovAV8ac99E8/AI0DD67wXg2zM0xpnmJa6eE4B/GD1f1luW9TbA6G9oN7B69DX/Z/S3diuSAf7KsqxVwN3An41+/tv6fJkpD3wT0GJZVqtlWSngNWDHDB07V9gBvDx6+2Vg5yyOZUawLOt9oG/M3RPNww7gFcvmQyCslLrlGlFOMCcTsQN4zbKspGVZZ4EW7N/aLYdlWR2WZf129HYMaAJquM3Pl5ky4DXAecff7aP33a5YwC+UUh8rpV4YvW+eZVkdo7cvAfNmZ2izzkTzcLufQ38+KgX8q0Neuy3nRClVD9wBHOQ2P1/MJubscI9lWXdiL/P+TCl1r/NByw4Nuu3Dg8w8CN8GlgDrgQ7gf8zucGYPpVQx8CPgLy3LGnQ+djueLzNlwC8ACxx/147ed1tiWdaF0f+7gB9jL3s79RJv9P+u2RvhrDLRPNy255BlWZ2WZY1YlpUFvsMVmeS2mhOllA/beP+bZVlvjN59W58vM2XADwMNSqlFSik/9sbLT2fo2HMKpVRAKRXUt4GHgBPY8/H10ad9HXhzdkY460w0Dz8FvjYaXXA3MOBYOt/SjNFud2GfL2DPyW6lVL5SahH2ht2hmR7fTKCUUsC/AE2WZf2946Hb+3yxLGtG/gHbgVPAGeBvZ+q4c+0fsBj43ei/T/RcAGXYu+ingb1A6WyPdQbm4v9hSwJpbI3yGxPNA6CwI5nOAMeBjbM9/hmck/87+pmPYRumasfz/3Z0TpqBR2Z7/NM4L/dgyyPHgKOj/7bf7ueLycQ0GAyGHMVsYhoMBkOOYgy4wWAw5CjGgBsMBkOOYgy4wWAw5CjGgBsMBkOOYgy4wWAw5CjGgBsMBkOOYgy4wWAw5Cj/H7JTbNVzJyMCAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(X[-1], cmap = 'gray')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "# https://github.com/guillaumegenthial/im2latex/blob/master/model/components/attention_mechanism.py\n", + "\n", + "class AttentionMechanism(object):\n", + " \"\"\"Class to compute attention over an image\"\"\"\n", + "\n", + " def __init__(self, img, dim_e, tiles=1):\n", + " \"\"\"Stores the image under the right shape.\n", + " We loose the H, W dimensions and merge them into a single\n", + " dimension that corresponds to \"regions\" of the image.\n", + " Args:\n", + " img: (tf.Tensor) image\n", + " dim_e: (int) dimension of the intermediary vector used to\n", + " compute attention\n", + " tiles: (int) default 1, input to context h may have size\n", + " (tile * batch_size, ...)\n", + " \"\"\"\n", + " if len(img.shape) == 3:\n", + " self._img = img\n", + " elif len(img.shape) == 4:\n", + " N = tf.shape(img)[0]\n", + " H, W = tf.shape(img)[1], tf.shape(img)[2] # image\n", + " C = img.shape[3].value # channels\n", + " self._img = tf.reshape(img, shape=[N, H*W, C])\n", + " else:\n", + " print(\"Image shape not supported\")\n", + " raise NotImplementedError\n", + "\n", + " # dimensions\n", + " self._n_regions = tf.shape(self._img)[1]\n", + " self._n_channels = self._img.shape[2].value\n", + " self._dim_e = dim_e\n", + " self._tiles = tiles\n", + " self._scope_name = \"att_mechanism\"\n", + "\n", + " # attention vector over the image\n", + " self._att_img = tf.layers.dense(\n", + " inputs=self._img,\n", + " units=self._dim_e,\n", + " use_bias=False,\n", + " name=\"att_img\")\n", + "\n", + "\n", + " def context(self, h):\n", + " \"\"\"Computes attention\n", + " Args:\n", + " h: (batch_size, num_units) hidden state\n", + " Returns:\n", + " c: (batch_size, channels) context vector\n", + " \"\"\"\n", + " with tf.variable_scope(self._scope_name):\n", + " if self._tiles > 1:\n", + " att_img = tf.expand_dims(self._att_img, axis=1)\n", + " att_img = tf.tile(att_img, multiples=[1, self._tiles, 1, 1])\n", + " att_img = tf.reshape(att_img, shape=[-1, self._n_regions,\n", + " self._dim_e])\n", + " img = tf.expand_dims(self._img, axis=1)\n", + " img = tf.tile(img, multiples=[1, self._tiles, 1, 1])\n", + " img = tf.reshape(img, shape=[-1, self._n_regions,\n", + " self._n_channels])\n", + " else:\n", + " att_img = self._att_img\n", + " img = self._img\n", + "\n", + " # computes attention over the hidden vector\n", + " att_h = tf.layers.dense(inputs=h, units=self._dim_e, use_bias=False)\n", + "\n", + " # sums the two contributions\n", + " att_h = tf.expand_dims(att_h, axis=1)\n", + " att = tf.tanh(att_img + att_h)\n", + "\n", + " # computes scalar product with beta vector\n", + " # works faster with a matmul than with a * and a tf.reduce_sum\n", + " att_beta = tf.get_variable(\"att_beta\", shape=[self._dim_e, 1],\n", + " dtype=tf.float32)\n", + " att_flat = tf.reshape(att, shape=[-1, self._dim_e])\n", + " e = tf.matmul(att_flat, att_beta)\n", + " e = tf.reshape(e, shape=[-1, self._n_regions])\n", + "\n", + " # compute weights\n", + " a = tf.nn.softmax(e)\n", + " a = tf.expand_dims(a, axis=-1)\n", + " c = tf.reduce_sum(a * img, axis=1)\n", + "\n", + " return c\n", + "\n", + "\n", + " def initial_cell_state(self, cell):\n", + " \"\"\"Returns initial state of a cell computed from the image\n", + " Assumes cell.state_type is an instance of named_tuple.\n", + " Ex: LSTMStateTuple\n", + " Args:\n", + " cell: (instance of RNNCell) must define _state_size\n", + " \"\"\"\n", + " _states_0 = []\n", + " for hidden_name in cell._state_size._fields:\n", + " hidden_dim = getattr(cell._state_size, hidden_name)\n", + " h = self.initial_state(hidden_name, hidden_dim)\n", + " _states_0.append(h)\n", + "\n", + " initial_state_cell = type(cell.state_size)(*_states_0)\n", + "\n", + " return initial_state_cell\n", + "\n", + "\n", + " def initial_state(self, name, dim):\n", + " \"\"\"Returns initial state of dimension specified by dim\"\"\"\n", + " with tf.variable_scope(self._scope_name):\n", + " img_mean = tf.reduce_mean(self._img, axis=1)\n", + " W = tf.get_variable(\"W_{}_0\".format(name), shape=[self._n_channels,\n", + " dim])\n", + " b = tf.get_variable(\"b_{}_0\".format(name), shape=[dim])\n", + " h = tf.tanh(tf.matmul(img_mean, W) + b)\n", + "\n", + " return h" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# https://github.com/guillaumegenthial/im2latex/blob/master/model/components/attention_cell.py\n", + "\n", + "import collections\n", + "from tensorflow.contrib.rnn import RNNCell, LSTMStateTuple\n", + "\n", + "\n", + "AttentionState = collections.namedtuple(\"AttentionState\", (\"cell_state\", \"o\"))\n", + "\n", + "\n", + "class AttentionCell(RNNCell):\n", + " def __init__(self, cell, attention_mechanism, dropout, dim_e,\n", + " dim_o, num_units,\n", + " num_proj, dtype=tf.float32):\n", + " \"\"\"\n", + " Args:\n", + " cell: (RNNCell)\n", + " attention_mechanism: (AttentionMechanism)\n", + " dropout: (tf.float)\n", + " attn_cell_config: (dict) hyper params\n", + " \"\"\"\n", + " # variables and tensors\n", + " self._cell = cell\n", + " self._attention_mechanism = attention_mechanism\n", + " self._dropout = dropout\n", + "\n", + " # hyperparameters and shapes\n", + " self._n_channels = self._attention_mechanism._n_channels\n", + " self._dim_e = dim_e\n", + " self._dim_o = dim_o\n", + " self._num_units = num_units\n", + " self._num_proj = num_proj\n", + " self._dtype = dtype\n", + "\n", + " # for RNNCell\n", + " self._state_size = AttentionState(self._cell._state_size, self._dim_o)\n", + "\n", + "\n", + " @property\n", + " def state_size(self):\n", + " return self._state_size\n", + "\n", + "\n", + " @property\n", + " def output_size(self):\n", + " return self._num_proj\n", + "\n", + "\n", + " @property\n", + " def output_dtype(self):\n", + " return self._dtype\n", + "\n", + "\n", + " def initial_state(self):\n", + " \"\"\"Returns initial state for the lstm\"\"\"\n", + " initial_cell_state = self._attention_mechanism.initial_cell_state(self._cell)\n", + " initial_o = self._attention_mechanism.initial_state(\"o\", self._dim_o)\n", + "\n", + " return AttentionState(initial_cell_state, initial_o)\n", + "\n", + "\n", + " def step(self, embedding, attn_cell_state):\n", + " \"\"\"\n", + " Args:\n", + " embedding: shape = (batch_size, dim_embeddings) embeddings\n", + " from previous time step\n", + " attn_cell_state: (AttentionState) state from previous time step\n", + " \"\"\"\n", + " prev_cell_state, o = attn_cell_state\n", + "\n", + " scope = tf.get_variable_scope()\n", + " with tf.variable_scope(scope):\n", + " # compute new h\n", + " x = tf.concat([embedding, o], axis=-1)\n", + " new_h, new_cell_state = self._cell.__call__(x, prev_cell_state)\n", + " new_h = tf.nn.dropout(new_h, self._dropout)\n", + "\n", + " # compute attention\n", + " c = self._attention_mechanism.context(new_h)\n", + "\n", + " # compute o\n", + " o_W_c = tf.get_variable(\"o_W_c\", dtype=tf.float32,\n", + " shape=(self._n_channels, self._dim_o))\n", + " o_W_h = tf.get_variable(\"o_W_h\", dtype=tf.float32,\n", + " shape=(self._num_units, self._dim_o))\n", + "\n", + " new_o = tf.tanh(tf.matmul(new_h, o_W_h) + tf.matmul(c, o_W_c))\n", + " new_o = tf.nn.dropout(new_o, self._dropout)\n", + "\n", + " y_W_o = tf.get_variable(\"y_W_o\", dtype=tf.float32,\n", + " shape=(self._dim_o, self._num_proj))\n", + " logits = tf.matmul(new_o, y_W_o)\n", + "\n", + " # new Attn cell state\n", + " new_state = AttentionState(new_cell_state, new_o)\n", + "\n", + " return logits, new_state\n", + "\n", + "\n", + " def __call__(self, inputs, state):\n", + " \"\"\"\n", + " Args:\n", + " inputs: the embedding of the previous word for training only\n", + " state: (AttentionState) (h, o) where h is the hidden state and\n", + " o is the vector used to make the prediction of\n", + " the previous word\n", + " \"\"\"\n", + " new_output, new_state = self.step(inputs, state)\n", + "\n", + " return (new_output, new_state)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "from __future__ import division\n", + "import math\n", + "import numpy as np\n", + "from six.moves import xrange\n", + "import tensorflow as tf\n", + "\n", + "\n", + "# taken from https://github.com/tensorflow/tensor2tensor/blob/37465a1759e278e8f073cd04cd9b4fe377d3c740/tensor2tensor/layers/common_attention.py\n", + "\n", + "# taken from https://raw.githubusercontent.com/guillaumegenthial/im2latex/master/model/components/positional.py\n", + "\n", + "def add_timing_signal_nd(x, min_timescale=1.0, max_timescale=1.0e4):\n", + " \"\"\"Adds a bunch of sinusoids of different frequencies to a Tensor.\n", + "\n", + " Each channel of the input Tensor is incremented by a sinusoid of a difft\n", + " frequency and phase in one of the positional dimensions.\n", + "\n", + " This allows attention to learn to use absolute and relative positions.\n", + " Timing signals should be added to some precursors of both the query and the\n", + " memory inputs to attention.\n", + "\n", + " The use of relative position is possible because sin(a+b) and cos(a+b) can\n", + " be experessed in terms of b, sin(a) and cos(a).\n", + "\n", + " x is a Tensor with n \"positional\" dimensions, e.g. one dimension for a\n", + " sequence or two dimensions for an image\n", + "\n", + " We use a geometric sequence of timescales starting with\n", + " min_timescale and ending with max_timescale. The number of different\n", + " timescales is equal to channels // (n * 2). For each timescale, we\n", + " generate the two sinusoidal signals sin(timestep/timescale) and\n", + " cos(timestep/timescale). All of these sinusoids are concatenated in\n", + " the channels dimension.\n", + "\n", + " Args:\n", + " x: a Tensor with shape [batch, d1 ... dn, channels]\n", + " min_timescale: a float\n", + " max_timescale: a float\n", + "\n", + " Returns:\n", + " a Tensor the same shape as x.\n", + "\n", + " \"\"\"\n", + " static_shape = x.get_shape().as_list()\n", + " num_dims = len(static_shape) - 2\n", + " channels = tf.shape(x)[-1]\n", + " num_timescales = channels // (num_dims * 2)\n", + " log_timescale_increment = (\n", + " math.log(float(max_timescale) / float(min_timescale)) /\n", + " (tf.to_float(num_timescales) - 1))\n", + " inv_timescales = min_timescale * tf.exp(\n", + " tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)\n", + " for dim in xrange(num_dims):\n", + " length = tf.shape(x)[dim + 1]\n", + " position = tf.to_float(tf.range(length))\n", + " scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(\n", + " inv_timescales, 0)\n", + " signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1)\n", + " prepad = dim * 2 * num_timescales\n", + " postpad = channels - (dim + 1) * 2 * num_timescales\n", + " signal = tf.pad(signal, [[0, 0], [prepad, postpad]])\n", + " for _ in xrange(1 + dim):\n", + " signal = tf.expand_dims(signal, 0)\n", + " for _ in xrange(num_dims - 1 - dim):\n", + " signal = tf.expand_dims(signal, -2)\n", + " x += signal\n", + " return x" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "attention_size = 256\n", + "size_layer = 256\n", + "embedded_size = 256\n", + "beam_width = 15\n", + "learning_rate = 1e-4" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "# CNN part I took from https://github.com/guillaumegenthial/im2latex/blob/master/model/encoder.py\n", + "# I use tf.contrib.seq2seq as decoder part\n", + "\n", + "class Model:\n", + " def __init__(self):\n", + " self.X = tf.placeholder(tf.float32, shape=(None, 60, 240, 1))\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " batch_size = tf.shape(self.X)[0]\n", + " x_len = tf.shape(self.X)[2] // 2\n", + " main = tf.strided_slice(self.Y, [0, 0], [batch_size, -1], [1, 1])\n", + " decoder_input = tf.concat([tf.fill([batch_size, 1], GO), main], 1)\n", + " \n", + " decoder_embeddings = tf.Variable(tf.random_uniform([len(encode_maps), embedded_size], -1, 1))\n", + " \n", + " img = self.X\n", + " \n", + " out = tf.layers.conv2d(img, 64, 3, 1, \"SAME\",\n", + " activation=tf.nn.relu)\n", + " out = tf.layers.max_pooling2d(out, 2, 2, \"SAME\")\n", + "\n", + " out = tf.layers.conv2d(out, 128, 3, 1, \"SAME\",\n", + " activation=tf.nn.relu)\n", + " out = tf.layers.max_pooling2d(out, 2, 2, \"SAME\")\n", + "\n", + " out = tf.layers.conv2d(out, 256, 3, 1, \"SAME\",\n", + " activation=tf.nn.relu)\n", + "\n", + " out = tf.layers.conv2d(out, 256, 3, 1, \"SAME\",\n", + " activation=tf.nn.relu)\n", + " out = tf.layers.max_pooling2d(out, (2, 1), (2, 1), \"SAME\")\n", + " out = tf.layers.conv2d(out, 512, 3, 1, \"SAME\",\n", + " activation=tf.nn.relu)\n", + " out = tf.layers.max_pooling2d(out, (1, 2), (1, 2), \"SAME\")\n", + " out = tf.layers.conv2d(out, 512, 3, 1, \"VALID\",\n", + " activation=tf.nn.relu)\n", + " img = add_timing_signal_nd(out)\n", + " print(img)\n", + " \n", + " with tf.variable_scope(\"attn_cell\", reuse=False):\n", + " attn_meca = AttentionMechanism(img, attention_size)\n", + " recu_cell = tf.nn.rnn_cell.LSTMCell(size_layer)\n", + " attn_cell = AttentionCell(recu_cell, attn_meca, 1.0,\n", + " attention_size, attention_size, size_layer, len(encode_maps))\n", + "\n", + " encoder_state = attn_cell.initial_state()\n", + "\n", + " training_helper = tf.contrib.seq2seq.ScheduledEmbeddingTrainingHelper(\n", + " inputs = tf.nn.embedding_lookup(decoder_embeddings, decoder_input),\n", + " sequence_length = self.Y_seq_len,\n", + " embedding = decoder_embeddings,\n", + " sampling_probability = 0.5,\n", + " time_major = False)\n", + " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", + " cell = attn_cell,\n", + " helper = training_helper,\n", + " initial_state = encoder_state,\n", + " output_layer = None)\n", + " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = training_decoder,\n", + " impute_finished = True,\n", + " maximum_iterations = tf.reduce_max(self.Y_seq_len))\n", + " \n", + " with tf.variable_scope(\"attn_cell\", reuse=True):\n", + " attn_meca = AttentionMechanism(img, attention_size, tiles=beam_width)\n", + " recu_cell = tf.nn.rnn_cell.LSTMCell(size_layer, reuse = True)\n", + " attn_cell = AttentionCell(recu_cell, attn_meca, 1.0,\n", + " attention_size, attention_size, size_layer, len(encode_maps))\n", + " \n", + " encoder_state = attn_cell.initial_state()\n", + " \n", + " predicting_decoder = tf.contrib.seq2seq.BeamSearchDecoder(\n", + " cell = attn_cell,\n", + " embedding = decoder_embeddings,\n", + " start_tokens = tf.tile(tf.constant([GO], dtype=tf.int32), [batch_size]),\n", + " end_token = EOS,\n", + " initial_state = tf.contrib.seq2seq.tile_batch(encoder_state, beam_width),\n", + " beam_width = beam_width,\n", + " output_layer = None,\n", + " length_penalty_weight = 0.0)\n", + " predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", + " decoder = predicting_decoder,\n", + " impute_finished = False,\n", + " maximum_iterations = x_len)\n", + " \n", + " self.training_logits = training_decoder_output.rnn_output\n", + " self.predicting_ids = predicting_decoder_output.predicted_ids\n", + " \n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(logits = self.training_logits,\n", + " targets = self.Y,\n", + " weights = masks)\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(self.cost)\n", + " y_t = tf.argmax(self.training_logits,axis=2)\n", + " y_t = tf.cast(y_t, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.Y, masks)\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: Logging before flag parsing goes to stderr.\n", + "W0829 22:55:10.605042 139914025953088 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "W0829 22:55:10.637653 139914025953088 deprecation.py:323] From :19: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv2D` instead.\n", + "W0829 22:55:10.642420 139914025953088 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0829 22:55:10.882626 139914025953088 deprecation.py:323] From :20: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.MaxPooling2D instead.\n", + "W0829 22:55:11.109754 139914025953088 deprecation.py:323] From :50: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "W0829 22:55:11.177250 139914025953088 deprecation.py:323] From :40: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"add_1:0\", shape=(?, 6, 28, 512), dtype=float32)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0829 22:55:11.449484 139914025953088 deprecation.py:323] From :42: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0829 22:55:11.623882 139914025953088 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0829 22:55:12.201008 139914025953088 deprecation.py:506] From :75: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "W0829 22:55:12.309709 139914025953088 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:107: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.random.categorical` instead.\n", + "W0829 22:55:12.324810 139914025953088 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:379: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0829 22:55:12.765539 139914025953088 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py:985: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", + " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2.9175596, 0.10526316)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = train_X[:5]\n", + "batch_x = np.array(batch_x).reshape((len(batch_x), image_height, image_width,image_channel))\n", + "y = train_Y[:5]\n", + "batch_y, _ = pad_sentence_batch(y, 0)\n", + "loss, logits, acc = sess.run([model.cost, model.training_logits, model.accuracy], feed_dict = {model.X: batch_x,\n", + " model.Y: batch_y})\n", + "loss, acc" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:30<00:00, 7.02it/s, accuracy=0.906, cost=0.305]\n", + "minibatch loop: 100%|██████████| 157/157 [00:09<00:00, 12.39it/s, accuracy=0.948, cost=0.205]\n", + "minibatch loop: 0%| | 1/625 [00:00<01:29, 6.94it/s, accuracy=0.932, cost=0.252]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 1, training avg loss 1.519639, training avg acc 0.485971\n", + "epoch 1, testing avg loss 0.280290, testing avg acc 0.919022\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.17it/s, accuracy=0.991, cost=0.0442]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.64it/s, accuracy=0.991, cost=0.0261]\n", + "minibatch loop: 0%| | 1/625 [00:00<01:28, 7.07it/s, accuracy=0.994, cost=0.0253]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 2, training avg loss 0.098987, training avg acc 0.973223\n", + "epoch 2, testing avg loss 0.036170, testing avg acc 0.991269\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.08it/s, accuracy=0.994, cost=0.0228] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.64it/s, accuracy=1, cost=0.00788] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:26, 7.23it/s, accuracy=0.999, cost=0.00922]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 3, training avg loss 0.026419, training avg acc 0.993798\n", + "epoch 3, testing avg loss 0.015001, testing avg acc 0.996805\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:28<00:00, 7.09it/s, accuracy=0.997, cost=0.0117] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.56it/s, accuracy=1, cost=0.00262] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:28, 7.08it/s, accuracy=0.999, cost=0.00642]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 4, training avg loss 0.013676, training avg acc 0.996753\n", + "epoch 4, testing avg loss 0.009664, testing avg acc 0.997876\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 6.94it/s, accuracy=1, cost=0.00306] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.63it/s, accuracy=1, cost=0.00112] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:28, 7.06it/s, accuracy=0.995, cost=0.042]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 5, training avg loss 0.012101, training avg acc 0.997094\n", + "epoch 5, testing avg loss 0.009274, testing avg acc 0.998141\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.12it/s, accuracy=0.999, cost=0.00509]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.68it/s, accuracy=1, cost=0.00279] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.16it/s, accuracy=1, cost=0.00305]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 6, training avg loss 0.009302, training avg acc 0.997793\n", + "epoch 6, testing avg loss 0.010704, testing avg acc 0.997450\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.08it/s, accuracy=0.999, cost=0.00318]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.55it/s, accuracy=1, cost=0.000676] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:30, 6.93it/s, accuracy=0.998, cost=0.00467]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 7, training avg loss 0.005349, training avg acc 0.998737\n", + "epoch 7, testing avg loss 0.006314, testing avg acc 0.998461\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.15it/s, accuracy=0.999, cost=0.00234]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.59it/s, accuracy=1, cost=0.000823] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:26, 7.18it/s, accuracy=1, cost=0.00265]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 8, training avg loss 0.007786, training avg acc 0.998095\n", + "epoch 8, testing avg loss 0.007873, testing avg acc 0.998013\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.15it/s, accuracy=0.999, cost=0.00574]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.64it/s, accuracy=1, cost=0.000296] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:34, 6.59it/s, accuracy=1, cost=0.000733]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 9, training avg loss 0.003831, training avg acc 0.999121\n", + "epoch 9, testing avg loss 0.003734, testing avg acc 0.999153\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 6.96it/s, accuracy=1, cost=0.00114] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.64it/s, accuracy=1, cost=0.000188] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:24, 7.35it/s, accuracy=1, cost=0.00105]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 10, training avg loss 0.004331, training avg acc 0.998917\n", + "epoch 10, testing avg loss 0.003307, testing avg acc 0.999179\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.04it/s, accuracy=0.999, cost=0.00771]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.63it/s, accuracy=1, cost=0.000244] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:29, 6.94it/s, accuracy=1, cost=0.000596]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 11, training avg loss 0.002805, training avg acc 0.999352\n", + "epoch 11, testing avg loss 0.003485, testing avg acc 0.999247\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.11it/s, accuracy=0.999, cost=0.00233]\n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.59it/s, accuracy=1, cost=0.000613] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.16it/s, accuracy=0.999, cost=0.00137]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 12, training avg loss 0.003280, training avg acc 0.999200\n", + "epoch 12, testing avg loss 0.003192, testing avg acc 0.999281\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.15it/s, accuracy=1, cost=0.00131] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.63it/s, accuracy=1, cost=0.000401] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.14it/s, accuracy=0.999, cost=0.00215]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 13, training avg loss 0.002531, training avg acc 0.999378\n", + "epoch 13, testing avg loss 0.003290, testing avg acc 0.999273\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.08it/s, accuracy=1, cost=0.000158] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.56it/s, accuracy=1, cost=5.84e-5] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.11it/s, accuracy=1, cost=7.04e-5]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 14, training avg loss 0.001583, training avg acc 0.999608\n", + "epoch 14, testing avg loss 0.001468, testing avg acc 0.999667\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.10it/s, accuracy=1, cost=8.3e-5] \n", + "minibatch loop: 100%|██████████| 157/157 [00:09<00:00, 17.41it/s, accuracy=1, cost=5.81e-5] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:25, 7.30it/s, accuracy=1, cost=3.57e-5]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 15, training avg loss 0.001462, training avg acc 0.999669\n", + "epoch 15, testing avg loss 0.000975, testing avg acc 0.999729\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.18it/s, accuracy=1, cost=9.13e-5] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.60it/s, accuracy=1, cost=2.84e-5] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.15it/s, accuracy=1, cost=3.45e-5]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 16, training avg loss 0.000887, training avg acc 0.999786\n", + "epoch 16, testing avg loss 0.000834, testing avg acc 0.999796\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.01it/s, accuracy=1, cost=9.9e-5] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.57it/s, accuracy=1, cost=4.35e-5] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.10it/s, accuracy=1, cost=5.74e-5]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 17, training avg loss 0.003594, training avg acc 0.999160\n", + "epoch 17, testing avg loss 0.001189, testing avg acc 0.999729\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:28<00:00, 7.07it/s, accuracy=1, cost=0.000129] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.59it/s, accuracy=1, cost=3.95e-5] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:28, 7.06it/s, accuracy=1, cost=3.42e-5]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 18, training avg loss 0.000426, training avg acc 0.999906\n", + "epoch 18, testing avg loss 0.000875, testing avg acc 0.999742\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.17it/s, accuracy=0.999, cost=0.00295] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.69it/s, accuracy=1, cost=0.00165] \n", + "minibatch loop: 0%| | 1/625 [00:00<01:27, 7.17it/s, accuracy=0.999, cost=0.00196]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 19, training avg loss 0.001924, training avg acc 0.999559\n", + "epoch 19, testing avg loss 0.004051, testing avg acc 0.999055\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 625/625 [01:27<00:00, 7.15it/s, accuracy=1, cost=5.21e-5] \n", + "minibatch loop: 100%|██████████| 157/157 [00:08<00:00, 17.54it/s, accuracy=1, cost=8.75e-5] " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 20, training avg loss 0.001526, training avg acc 0.999603\n", + "epoch 20, testing avg loss 0.000684, testing avg acc 0.999837\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "for e in range(epoch):\n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'minibatch loop')\n", + " train_loss, train_acc, test_loss, test_acc = [], [], [], []\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_X))\n", + " batch_x = train_X[i : index]\n", + " batch_x = np.array(batch_x).reshape((len(batch_x), image_height, image_width,image_channel))\n", + " y = train_Y[i : index]\n", + " batch_y, _ = pad_sentence_batch(y, 0)\n", + " feed = {model.X: batch_x,\n", + " model.Y: batch_y}\n", + " accuracy, loss, _ = sess.run([model.accuracy,model.cost,model.optimizer],\n", + " feed_dict = feed)\n", + " train_loss.append(loss)\n", + " train_acc.append(accuracy)\n", + " pbar.set_postfix(cost = loss, accuracy = accuracy)\n", + " \n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'minibatch loop')\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = test_X[i : index]\n", + " batch_x = np.array(batch_x).reshape((len(batch_x), image_height, image_width,image_channel))\n", + " y = test_Y[i : index]\n", + " batch_y, _ = pad_sentence_batch(y, 0)\n", + " feed = {model.X: batch_x,\n", + " model.Y: batch_y,}\n", + " accuracy, loss = sess.run([model.accuracy,model.cost],\n", + " feed_dict = feed)\n", + "\n", + " test_loss.append(loss)\n", + " test_acc.append(accuracy)\n", + " pbar.set_postfix(cost = loss, accuracy = accuracy)\n", + " \n", + " print('epoch %d, training avg loss %f, training avg acc %f'%(e+1,\n", + " np.mean(train_loss),np.mean(train_acc)))\n", + " print('epoch %d, testing avg loss %f, testing avg acc %f'%(e+1,\n", + " np.mean(test_loss),np.mean(test_acc)))" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(9, 15)" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "decoded = sess.run(model.predicting_ids, feed_dict = {model.X: batch_x[:1],\n", + " model.Y: batch_y[:1]})[0]\n", + "decoded.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n", + "9+(0+1)\n" + ] + } + ], + "source": [ + "for i in range(decoded.shape[1]):\n", + " d = decoded[:,0]\n", + " print(''.join([decode_maps[i] for i in d if i not in [0,1,2]]))" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAACDCAYAAACUaEA8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOy9d5hl6VXe+1s7nHwqh+6qznF6Qk/WJA0z0khWGAmBBTIIgwCBkA22L9gXG/A1ugb84AsIfG2RFBAYhECDQEJCiBlJM5okTc49PR2mOlXoyunkvT//sb59zqnqqu6qru7p7pn9Pk8/fersffb+dlp7fWu9611ijCFGjBgxYlx6cC70AGLEiBEjxtkhNuAxYsSIcYkiNuAxYsSIcYkiNuAxYsSIcYkiNuAxYsSIcYkiNuAxYsSIcYkiNuCvM4jIZ0Xk1+3n20Vk/4Ue05kgIg+LyLWv4f7eKyJ/9Vrt71xBRO4UkeMXYL/dIvKyiKRfw33+joj8q9dqf5cqYgP+OoYx5kFjzO4zrSciPy4iD61m2yLSLyJfEpEJETkuIh89mzGKyHuBWWPM003f/byIDIvIjIh8RkSSZ7HdXxOR50WkJiIfa15mjPl74AoR2Xs2Y17FGLaIiBER73zu5zT7v1VEHhORWRF5TkTefJab+k/AZ40xRbvdpL0uM/Y6/cJZjC0hIveIyIA9R3cuWuW3gV8WkcRZjvkNgdiAX8S4UA/+CvHnwKtAL3A38N9E5C1LrSgi9y/xgEb4KPC/m9Z9B2ow7gI2A9uA/3eZ7X5WRH58me0eBH4R+Ooyy/8S+Mgyyy55iEgH8PfAbwFtwP8H/L2ItC+z/pIVffbl+SH0ekf4GLATvT5vAX5RRN65zO8HRGTLMsN8CPiXwPDiBcaYIeBl4HuX+W0MYgP+msPe0L8kIi+JyKSI/ImIpOyyO603+x9FZBj4E/v9e0TkGRGZEpFHmj1HEblWRJ6yXtZfAammZQum3CKyUUS+KCKjIjIuIv9LRPYAfwjcIiJzIjK1gmPIAXcCv2GMqRpjngXuAX5yleciAbwVeKDp6w8BnzbGvGiMmQR+Dfjx1WwXwBjzp8aYrwGzy6xyP/riWelYv2C9zWkR+baIXNG0LG2n/Efs8odsuOHbdpUpe25vEZGPicifN/12gZcuIj8hIvvs9TwsIj+z2mO3uBUYNsZ8wRgTGGP+HBgF/vkqt3MTMGWMaQ7dfAj4NWPMpDFmH/BJVnmNjDEVY8zvGWMeAoJlVrufVVyjNyJiA35h8CPAO4DtwC7gPzctWwd0oN7NR2xs+DPAzwCdwB8BX7bT2ATwd6gH2wF8AXj/UjsUERf4CnAE2AL0A5+3D+BHgUeNMTljTJtd/4Mi8twy45dF/0efr1zpCbDYCYSLjMMVwLNNfz8L9IpI5yq3fSbsA7aISMsK1/8aOt4e4CngL5qW/TZwPWo0O1DPPwS+xy5vs+f20RXs5yTwHqAF+Angd0XkuqVWFJHfF5HfP822ZIm/V3uNrgLqeRTrwa/n1Gt0Bece+4Crz8N2XzeIDfiFwf8yxhwzxkwAvwH8cNOyEPhVY0zZxhw/AvyRMea71pP6U6AM3Gz/+cDvWU/4HuDxZfb5JqAP+L+NMfPGmJL1fpaEMeZzxpglY8TGmFngYeD/EZGUNTDvBzKrOAegU/vFHnIOmG76O/qcX+W2z4Rov20rWdkY8xljzKwxpoyGEK4WkVYRcdCZx78zxpyw1+gRu96qYYz5qjHmkFE8APwTcPsy6/5rY8y/XmZTjwJ9IvLDIuKLyIdQh2Gt1yhn/198jc719cHud0XX542K2IBfGBxr+nwENawRRo0xpaa/NwP/3oZPpmyIY6P9TR9wwixUJDuyzD43AkeMMbW1Dx/QWcRW9Fj+AI2RNodrmsf7ZuArTd/9J7vaJKc++HOo9xkh+jxrt/tc03Y/CPx+03ZP540uRrTflYSMXBH5TRE5JCIzwIBd1GX/pYBDq9j36fb1LhH5jmhyeAp4t93HqmCMGQfeB/wCMAK8E7gPe41E5M2LrtGCayaNhOfiazRn/198jaLrs2nRdjcBzzV998FVHEaeFVyfNzIu5iTZ6xkbmz5vAgab/l6cTDqGxpp/Y/FGROQOoF9EpMmIb2JpY3IM2CQi3hJGfNWSlMaYI+hUPxrL54DHmpa3NS27H/iYMeb+RZs5qIul3xhzwn73Ijpt/mv799XAiDVINM8KROSzwP3GmM+udvzAHmDAGDOzgnU/iBrDt6HGuxU1bAKMASXUu3120e+WOq/zLPSC10UfbMLwb4AfA75kjKmKyN9xaihkRbAe/I122x5wGPgdu+whmrxbewst5e0+B/x80zYnRWQIvS732q+vRq8bxpiji7Y7ANxpjBk4i0PYw6nnNEYTYg/8wuBnRWSDKFPgV4DTcZI/CXxURG4SRVZE7haRPDpNrgH/1k6T/zkaKlkKjwFDwG/abaRE5Da7bATYIKugbInIHhHJi9LB/iXwz4CPr/T3oIks1Cu8o+nrPwM+LCKXi0gbmh/47Gq2a8fniyaHHcCzx+s2rXIHGteO1v+YfdEshTwathpHje9/azqGEM1RfFxE+qy3fos1xqNoSGxb07aeAb7HeqqtwC81LUsA0e9qIvIu9LyeFUQT3L6N8/82cMwY8/VVbuYxoE1E+pu++zPgP4tIu4hcBvw0Z3eNkvYaASTsNWp+WS24RjFORWzALww+h8Y2D6Pe8q8vt6Ix5gn0AflfqNd3EJvxtwbwn9u/J4B/AXxxme0EwHuBHcBRdCr9L+zib6Ie1LCIjAGIyI+IyIunOYZ32PFPoknQdxpjRk971Evjj4AfbRrnP6KUt2/ZcR4BfvUstvtJoIjmF37Ffv7RpuU/bPcdYSMa118Kf2bHcQJ4CfjOouX/AXgezT9MAP8dcIwxBTTH8bANH9xsjLkXfWE/BzyJJpaBem7h36Kzj0nU8//ycgcoIn8oIn+43HI0mTqGzr7WA99/mnWXhL3HPovS/SL8KnrfHkEZRL9lr9tqsR+9Lv3A1+3nzQAish64HE3Sx1gGEjd0eG1hp5Q/ZYy570KP5WKBiDwM/FxzMc953t97gR81xnyg6btngLuiUE2MBkSkG3gQuDYq5nkN9vk7wCFjzGryGm84xAb8NUZswGPEiHGuEIdQYsSIEeMSxZoMuIi8U0T2i8jBJmpYjNPAGLMl9r5jxIhxLnDWIRSb0X8FeDuaEHsc+GFjzEvnbngxYsSIEWM5rMUDfxNw0Bhz2GaqP49yZWPEiBEjxmuAtRTy9LOwovA4KnyzLLo6XLNlo7/s8hfnOwDwD5UWfF/tyXLFugZD7cWRbv1gWPB9jBgxYrwe8eRz5TFjTPfi7897JaaIfAQr27mp3+Oxr29ccr2jtTkeKW4A4PPDb2L/N7YDUN5eYnPfOKOzKsGQvyfPjRtVvKzc5jKxR/ilH/wbAN6XHaDdXa3UwxsDc2GJoKkw0EXIOanT/CJGjBgXC9z1B5eUyFhLCOUEC0vCN9jvFsAY88fGmBuMMTd0d7qLF8eIESNGjLPEWjzwx4GdIrIVNdw/hFaOnRU2eTnyGbX/Ozf9vQppAgPVLj5++G1497cC4BcaMh5TOxy+9+5H2eKPne1uX7cohBWmwwoAU6HDYJDnu/M7APCdGrdnXmGDp7pE3W6SpCwf2ooRI8bFibM24MaYmoj8HFoC6wKfMcacrvT6jIjCH9c3Oerd7iD/Zuu3+Pz7VeJj/ze241ytSpY/vftR3pZ7iT2+Gh9f4vBJhIKp8t2y6iR9afxaBmY7mK+o1Em56vHX3vX8+DaVp745fYhtXgEgDkHFiHEJYU0xcGPMPwD/cI7GEiNGjBgxVoGLXk42Cq3s3PT3+sVPNJZ1uxU2ebmlf/gGRdVognc0EA6UewF4YWw9Y0OtOHM6tZEA5jz47aPvAuDGvYf4yXXa2+HKxDjr3QyuxEW6lzKmQ5UsGQlCRoM0U6HOrPJOiW5HZ1vdrqHdSb+hr/VkoOficG2hKbxUbMtFb8BBp/XXL5n/jBtWNyMwIUOBPrjfLW3j0QlVMR0fy5M+6mOjJIhNI9Qy+uA+PbeLX75Mewb8lz1f5brkML2uNoKPY+MXL+ZCpduOBDVmQx/H9iXOS43pUK/bV2av5QuHr2VuThlHuVyJOzYcBOBfdT1Aq2N4o1ILlPmmPIyI+RaFZz98Snj24jxLl4QBj3FmVE3AZFjimXIPAJ8aeDNDL+vnzLCDNw9uRR9wCfU3ntWV80rCLMrB/zV5N7962Ve4PaWJ4aR7qgEPjG6gbGqM2UTpYE29vJI5df2UVGlzCnS7usP1boKME798zwbRDGssKHKgph7ilyav48nxTVRDfSH/u23fICG63nMz/cwdbiU9osvm1id5LqPS3ofbOtjszVy0xul8oGoC9lWrANw3t5f/+fhbANj8BYdeqswNaqOhTxTu4G23X/xF5W/cuVOMGDFiXOKIPfDXCebCMs+U2/jD43cCcOLVLjLW60pMG5wlOmHaGTdu0ZCY1EYoE6MtHNy6jmuSJwFoX/SbqgkYs2Gaw7UMX5u5HoBvDO3i5FgLYVm9OTcdENjPBIKTrvG2XS8DsDszQp8/CcA6b5qN3gzrXfXIY8/89JgLtVfys5VOfvfI2wF4ZX8f3qxL0KuzoVf61rM3fRSAnuQcYTok9PVe8KcdToxpx7OBDd0M+WNssE1w3mjhsk88fQedD0f3mz4g47fpOfw31z5wgUa1OsQG/BJHwYYwjtRcHp7fxb79Ws2aPu6RsL3E7Wz6FBjbvMo4grG2VtwQX4Jlp2bTYalOT7xn9AYeOahxdmc0QWLWwUZJkIDGS0Mg9H2+feBaAL6VNtTS9u3RXuGde17iiqzWALS5BXwbpO9xZ+nzZul1dTStTnqlp+V1gyhcNRkWGQkcni9r+OPrk1dyYJ9+zpzQi1dode1vHHpcvfjf2/4U89cmeGjmSgASU0JwVM/jH2XeTPflM+Qdbcnas0S47PWMn732AT6duQWA+Wdb2X3XIT66Ttu63po+dkkkMeMQSowYMWJcoog98EsYc2GJw9bLvWf6Rv765WtJnlQvzJ8DJ0paLqEYbBz9BxCkoNKunt763il6/SkSsnQj9OnQ8HRhMwCPHdmMN6hsFX9acMvg6oQAp7pwpxJCcrLxd2Cn9LVsim8duo5/yl8DQJg2GFd/67VUuHv3C3ywQwuOLvdLbyj9lqoJmLZMk5eqWX5/6K08/qqeexlKkZrSa+TPQ5BE2ycDVePWE8bdbpEt6XHub9VpWHLCIzWmvyscauXvuq8j26thmZuS43S52dfq8C4IfHHZm9BnpK3lOe64TsN6XKf/bfP0gWp3L37vGy5xAx4xL8YDG78NU8yGOj0MELrdWTodvTm7XPd1MwWPptUFE/D1ub0AfPXIFZiBLAn7UEeMk2Y0QiZgXKFmn9VyuyHsUsvbm5ml053Dp2HAo/0NBQWer6zj2SkN09TGUqSnrRGZs/u0u13qpdEM1xp4dwoS08Cw2LEJoa+fKy0eXypfzfAuZQb89LoH2O0rzetcl/8386YHa3kATgZ5SqFPp5UcWOfO0GvfUOeTKx8xTSbDEi9VdCz//4m7eGZgI/5RfWGmT0r9JSmhnjdvVg3TiVIbh6qavdCQVABOdGEaL9nEhMOjL+ygK6nHt7P7frreOIQUNnk5Np1iAS+tHMwlZ8ADEzJpH7bBwOX58ib+YlBVbMcKWUpVPaRq1SOTKvOBLdon9wMtT9P6OgkYhdZKDtY8Xi2qwuTMTJrUtOBaJd6IKtiMyOMOE0ItA7WMbqfaUaOvdwqAG9qO0ufOkpTGrRHtbyRI8IfH7uCVl2ycfdDF12dfjYk5s+FeCmJoGP4QsE1G/FmBQ0kem98JwLEdbfzyDi38vTU5sSTF8WxQNQEjgZ6wT0/cxjdP7AJg7GQLlBwkq17ZlVsG+eWNXwWg2w1wz1MEctJ63Q8W1/O3Y+oavji0Hm8wSWpcX25OxeA05zYMOMqO44GXd7HzOk1Cf3/L0+xIDZPrngegPNGKW9ZtJGahMucyWFSdoYkgBW+sMPglj9eJSYsRI0aMNx4uGQ+8bNS9GAnKPGVZEJ8buYnHX94KNX0PSU1wrHchNWEqmeVv3asBqG5weUvuJda5Wo54votJAhNSNgu5e1HBxFoKJ6KwEcBArZenRtUbluEUXgmcYGGxToTQVc8boNIKpd4Ap13n0v2d09y57gAA72t5ho2eg2tj4NNhkeP2MB6Yv4LDI12kRnT8yaml6YkRIo/fLAqnS3h6Tz3yLKVocKpCRJEZTHZyf88eANranuRKKZ6TsFhII2yyf6aXsSH1SNNHfbx5qGX1MXmBPv4mfwMApdZn2ebP0OWce/rjlL1235nbziOvqC5+8nCSxCy4JT1xzhLMIlt8ya27DnFdZgCArBNyffIEv3L51wD4Dd5FKdDjS0wLTkWYq2pYZiZMUQjnYyrnJYRLwoAHJqRkjeGBaisfP6z81xP7ekmPOnVjJaZBmZNADdb0hFYj/mnuLXym5zbeeYUKJt7Zuo9dvk4z+7wanedYEyLEMGeqHKvpU9XhVOi1XOe1GvDhQH//pfFrGRlSTm96SnBLZtnQiXE1bAJQaQtJ9hb4wK6nAPie3MvstLHlVscl56TqL8wjNeHLM5pg/POXb8QcyeDpbHxZeiKo0Y5i2UESgkRjim+8hUbdqVIv83ebkp9i9IWUmLUrD/jcU70RgKFrWvmp3m9zpeVKrlXTo2JfEhOlDM68Wx9X89gSAynuKej+H926lR/b/B3em30FONcGXLc1VGqFWX1EEzPgFZe+vhh9xwU5XbglM85OfxzQOG9gQvy0Ns/66O4H+bR7KwDFpzpxi8KRca3CfaxrOzv9J9hkw2dvZI2USwXxFYoRI0aMSxQXvQc+HRYZCUIOVDsB+PLEtRwfUo8hNeaQnAIJFnptACyaooe+UBtN8s0jmhT6WvdeerdMAPD+TU+zMznCRk//7nUrdJzl1LhZ3ezF8maemNsKwLb0KNektDpuhz9z1iyGgqlyrKbnYqjQghRt5WOVU4458nJDV6iloZrTFYK2Gru6xnlLbh8A1ybnaXUW0qYmAmXvvFTeyHcm9BhqR3JkRgWv0ERPjDxCG+kIXbtTh7rHX81CNR8SpnVlv71ET5tmP9N+lUP7+ur0x8Ss1MMEEuo/r2jDBlVwAr1lH3Z3Ugld/vMGTSquRZQpMKau4VINnQWzOAmpFyclQ8DOnoZbWzjU28NUWoWh1p/lvheOI2Q8LDJQ7QPg6Gw7iUmnPhYj0EQOqt/rtbRQzQM22bo1OUqmaT1XHPqtzvtVqWNsaNXZ1r5MB8lxoXRU6Uj/1LKHN2dfoc/Ta3++krQxzh0uSgPezDR5qZrl8+M38/WXNfZpJpIkJ2xZ8JyyH1bCfHArBrcCvq1OTEx7TI9oeOUPX3k73voCV/drNeB7up7lnVltQZc03hkNbUT7GgmKvFBR4/qZ4Tfz+HPbcUr6W5MwdG5RIvS/33kv16WOs8GyKFbzkqgaw3BVwybHJttw56L4/xIrRw+xo/HRaosa0P7+CX6s71F2+DO6f1kYRy6EFQYDHdO3pi/jxai6c0q53ktO41EDE1iadrnT1F8YTleZHetGeWvPfkArBTOuGol13jTfbtnNfQd3A1B8NUP6pKVClhcyWyQAO2T8cY+Xx3p4pENjxH7mAH2e/i4nybOe/hsjixgxIGH0AhG8gu6jXHYpBj7Vc2jkQgyDNY8/G9QQx/EDPWTnbE4nAONJRNCps35AQ1SlzRU+fN3DANycfpXWRfdUdD6yUqEtoU5GkA6RwMWf1WWjMzmOVju4PKH3fqfjxmGUixwXpQEPMYxabvefnLydB17eRXLAyptONBWohIBA0BRrNV59IzhVawRoSoxFSnwFg1OLDIVDuZzlJV+Tozuyo9xoveX2FXh2kQE/UsvwWwPvAGDghT5yxx38Oev1Jh1mJ9S4/8rk9/HLN3yNd2fVe1uNAS8ZmAjUYyrMpEgVIt73QsNqpIk26FnD2qJWfmfbKFcnT9DrquFeHJOvEvBkSQ3js2P9eFM2Dju7xAszSlQ6EKSEqlK2qXTX2LNTX4h39z7PjenDtDmaNG1zGrG7gjFs8ceYq+k5eDjYTlH0LZCYEvz5hQVJkTFNTgjF/W18vHQXAI9v2cq7O54D4LrkIBu89IpzDa4IKdEAfWgEqUWJcBDTiDtL1eAW7fme8BkutVAy5444XTUBJ4JWxovqLTslqc8GahmbE7B5BJpe2EESkrkyO5IjAGz13GXvqZQE9CbVi5H2CrWxdP1FUJpO8q2pPexK6HYyiQo5eeMUTl2KiF+vMWLEiHGJ4iL1wEMGA6V1DRVaYNarhz6aCxiMNOK7AKUeQ9CjXl4qV2Z+JItrp4deUaf/zU0Noko2ryCEnjA/rp7Py129TLSq51E1tdN6cs10wYHqeibmdRvevODPN8bqFEydlVGe9Xm13E0hc3Atp0mPox6vXaLy0mnMTGq5kCu3qkf8A12P0+0anOaAqkVU1HKiopV8M4UUXjEq4Tx1/1GcPUiK0hP71EV829UvcWeblilfkzxuvcJTy7QDE5KRIr+w7l4Abm09xD9svAqA51/eSGrIryslesXG+dRzK1QKGrv/5vQeRi/Tz8H6R+lwh2mVlVEMA2Oo2nlWe6rIWDqatakH3JxXiTxgqQqFWmJJ/fOzhS8u/e40nWm9SYfba8wlI5UxVRKMzoVbohEis/9HTJqQZWJcQIcTsDOtHnZ/9xTHaw7uiHrr3oTPQ4e3c0vrIQA2e6+Qi128ixpnNOAishH4M6AXfYT/2BjzP0SkA/grYAswAHzAGDO53HZWg+ak0kQxgzvvYEOmiGkKDbhidTz0CQvXl9jVr9TAmzoH2HXVMMcrmvB8cHwHL77aR+pVDcUkpsGzVYtu2ZCcEoKkno6XWtfxSIdW/3U4z7PZWz7MUSNgItQHZqKWo1DS9ZyKsNjiRSXM3ozL0WI7h3Ltdh/TC5oJT4dFxoKoNVqak/Zl1uYWmA27GKvq3yILjcuC0EZzCMWHsKvClqxSyzZ6U6fVvJgNfV6c0bRccTxNJjr34SmHVFcxDFJQ7g7YvUOV7d7e/gI3JvWF0esmln0JuuLQ5WZptZa5193P5Sn93RdzN/DlF/cir6Tq5y+K9UsATpl6JWgw6XF8WvnNRzq7mEieIGdjHyuJ4wb2ZA3N5klMRYnD5fMrYjTcci7hIPR5NX5+4z8B8I8te3lwWENZPdk5nj+4AW+ucR9Ws7r/as7QlS+wPaH3/ukcjlYnwdszSn3ctH2cP8vdyqPlHQAkRzyqwymemdsEwBXJ4+QdfUjeSBo0lxJW8n6tAf/eGHM5cDPwsyJyOfCfgG8YY3YC37B/x4gRI0aM1whn9MCNMUPAkP08KyL7gH7gfcCddrU/Be4H/uO5GljVZiOLFR+vJAsSWVHSspqHUpch6NR57e07DvHB7u8CcHVinLzjMZvWjPp6f5JPlW/n+KwyT5yqq1V+aLGI1AwJK8xUPJrlc0kt2Nh5+Qjd7ggZK3LT3JKpahzGgywvlJRB8YVj11Id02l7unLqMUXeY2pCeOjxPaRv1u10dj9AW6iu5HCQ4ZnSLu4dvRyA54/11b3edLbChrYp1mdm6tuMvOxToiGmkdCtZQ3JTJUtKfXAu09TPhkSMhWmOVlQL9+dcxuzn3ChOxq6mrgEKHcY/O4iV7WpB35FYrheuLSSJG3kNfa4GVzRSqGrMsf5dtt2ZvNW8XBO6jopTqDXzVRseGVemBzXEMqzXRt5a3YfG+yJW0masRDqPioVDydSVKywrIuzHBNnLXDFocfNcoMVl9rd9SA/2aHMkq/MXcXLg70YT8+laaJptl05zs9sfZDNNj7okTl14xYZJ1Ev1PFlnN25Eb6b3aILjU9yQvjHF6/Q7V5doK/jEYA4lHKRYlUxcBHZAlwLfBfotcYdYBgNsZwzlGxdcBA40Fx6HYJ91qi0G5wNBb5/t7IP3tn6HFcn1Lh1WTlI37ILbk4fYXZjmv85faf+tpDDtQ+/zKtBqFfcTTlMT2qI4dnCJrb5Y5SMPlQHqy18dVrLqe/5zo04ZYf+yzWmeHK8BSk3WdJF0++oylBmIDXi8tRJbaj69dRVuNYi/On+mygfzZG0/N90lbpBKfameHWLi9Oj2zHhaabwogYWVLTqnVtf5taMlssvppgtRmCceoNcIwurWxeHaQJ7LYL+Em/beoDva3sSgF43xJezq07MWGN+feoIV3YP8eCYvkxq04n6S9eEyg6J8hj+nFAZ13vm1ZkOxnuyhNgLeh7a9przaNAieQCfCtOhnvxy6FMteSTL0f6Fal6PvTs7x7bESTassgFBVhyuTB8nk9VK24qbwp0TZFLP4+FCF6NteoG3rvmoYpwPrNiAi0gO+Bvg/zLGzEiTXrQxxogsHS0UkY8AHwHY1L/y94XT5OIs3nKk+VDtrfAje57iA22PA7DZM+QWJa4iydE+N+DG9GHeuV3fM18u7aVmO3XXE5vWw/SKUk/sfOXYlRTCBF02i3rfyB6mPquGd/N4jUoeJkY0XnzTe/fxTEq7pFQqedyK1GmEzR6bhKrhPPukdoL/5Mnb6e7V4orqwTy54UaxjHGoe9f+jFCcTDGUUq6eM+M1aGX2HDUX70ScbNNSZWNqgl5bkZJxTv+gJyTAd6Ps6/LrGWkkkHP5EnuyQ+z2dR/Npe2TQYHDtYXXvtsmBBZ3PXHFqVPXtvpFfqDrCUY26/EeHt1YT6hKCARNmiBFU+dozxRTjAYtjAaaknEpM2Ub/o4EOcaDHHlHx9njztHqVFnn6flvyxWZyOuYqvN6Hc5GYfFcYDqs8J2ixqc/88ytpAaS9aKitSC6LhnHp9OdY2ObKlG+3JonMePW79VK4FI9hzTJGOceK/IjRMRHjfdfGGO+aL8eEZH1dvl64ORSvzXG/LEx5gZjzA3dnfHNECNGjBjnCithoQjwaWCfMebjTYu+DHwI+E37/5fWOpiov+OrtYBXyxqrLs0nyDTHkx0VRgJIZivsSg2x2VMX6XTKdFAF0AMAACAASURBVEnx6XOLdQpVvqVIIbcwthp5Hm6JemOEseNt7Muv4/i0xrnD+zppmW3EkCd3O7z7fd8B4LrcEW5o1Zj7X6ZuYP7RrobwU9NYxCgNTmyxUmImQXGf6nqnYIHCnxM0vGp/DpwBj/JxZdYkaTBbToEDYULPSyJTJeNU8FdBmvCtaxt1x2lGM7slsJS7re0TXJM6SsbOeFxxOFrTsNMjxY18fvhN7P+GMiqcq6f58G7tsvO23Evs8f0lmROtTpp+b4peG/M/kAsJEpHQlOiMqWl4bkkPcG4iwzcn97DFHwVgPkxyr+0J+Y9H9jA3lcbx9WLvWD/KRzZ+mzZXL9QPbX6C/x28CYDibBdu6fSiXecTBQOvlvW+YNbHnwPPygwEvqw5Du/hstmbYWNWZyr7cusJki6uncmcLOQZrmnV73Q48rppiPJ6wkpiGrcBPwo8LyLP2O9+GTXcfy0iHwaOAB9Y62AKVgHvxUoff7LvZh3gYHKBkWrWgxDR6f5K4CDkHZcrLLVtd9dJHu/UqXI44dcTdaChlCjWKoFQrnnctUGpVwfe37PAEP307kd5W+4lAPb4Po+WNVF4X/YypnOdmMmIKL04DtTQ+DDlhTHVxXKr0WevpNWjYRP1uM5LXkzvcxoUP9cNSUl1VZHgolVRlOqpVr+5FVutWy/Oltw43e58Pe79XKXEfbZb0P98/C1s/oJDLzrYucEWPlG4A4C33f7SaceRd6psSquB8TpKVKdslWJN6lolYDVLLC3UmfZ4cWIdD2X1pTsbpHhoRJsvFw+3kJ5y6ufm0PAGfnHw/dy95wUAasalVLmwXQ0iPZ0D1U4GClq96845SK2RwK/lVF8GoD8zTbdTgNMkL5eCKw4drsu72p8HYH9/L0OD6/HmdR8njnTyBSufu7P/H9jjB2tS0oxx7rESFspDnMpxiHDXuR1OjBgxYsRYKS6qSsyqpYjNBmkVFcLqUdQWNgeIvKcwFI5VO5gIVOs4I8t7CK44pPDqmhdAvU9gkAS3KLgRTa7JmzWOYUN2ire3qIf2Q+3z8BON5d1uZUEibp2dim/MTvJydiO1nB3PnCzQuj4bRPom0bmQYOFYF6zbFGpqz5Rsb8SVxVAqxqVqNcclkFOaB0QVnqEHXlIXZpwKAVLXEQf4xNPqZXc+nKBZvGP8tgr/5toHVjSWDgf2ZlSX5pn1G3hxToW13JJvQyaNExDNRrx5YWI6y2yfJkMfH9/M8JAWTSVsRW5UrOiUHYL5NP94/AZ7vI1QjF84dWYTJbr9WWF0Psfhiob6tnmHWW+LsdYqADVsz/cXxm7kkccvAyA7LjjVBjW01GG49QYVB3t/5xOsO0vHuNVJs87VBG5Xeo5jaUNyQo8/OeLxbIcm5R9p304q+8pZCbDFOH+4qAx4hJRTpaNFDeFoKruQsiUNWl1pMsW3RndztRWealtU0bgYDg4ddp7dnijgpvRJCVKmbpSa9wOAH5L1yvR7Goe9IrE4DrjwRu622/n+jqeYvyHJdwuqopgtCkShIEHTx02qd9LEIlkp60GMii3pDxvfgY1P5/T4trROsM6bIrWC6W9UBVu2rBGn2ijTX7K7vX3RjpRbGK7l2ebN1pf9rDXSn87cwvyzrey+S0u0P7ruMW61DQYWs1AWI+8kuCap3PLRnhc4eFKZO0HKw7iNayYGnJrlfZeEYtGnGKixubXrMDMVzXdMHutVQ9907t0K9XyEUwOpNcnZNh+zabBe/DmYeqGTP3JvB+Cy3UN0W+bOWmVYR0O9h4eKLZqfAfwZ7X5USVt5hHzIloyG63b642vqou7bMGTCCRbQGvw5oTig2/0D/3vovnyWjrQyhzOXWPPf1ysuKgMeeYhZp0zZNidGVEPERDFq04h1etMux6dbGVyv3lUpcfpKfl9cel29Q+9uf5Yj6zUZeOjEptP+LjQOwbJRpIWIXiA7/XG2ZMZ5xMYpQ9+lTks26IOyxCZPZ7yDpFIDm5OIft3wLPyhcbEC1tCfmqLNKeOsMAoe4lCu2WRhjVO8/Migu2UhmPHtb4QWp1Snbe5NuLS1KD//jutehusav9/m1VZscDT5rAbmqtQxtnWr0Xp5OENywoUmWl1zPByg01cnoC8xSXdGP4+lDNCUANQ/G4nKJXIPjS8aH715gz8nTM3p9R6odrHXqvit8FZZNYKUUI1OW77KjpRNyC92PlaJvJ269KenMG1VgpP6svMKtrE0MDeX4nClm+vsy/Ri7V5/tDbHaLDw5bLN09nf6Zy7SxVxfVWMGDFiXKK4qDzwCL7U6MkpBW0y20roOw3mSRPbIDEtzJ7M8eImjdPtTAyTsQUay1Geou973FlSro3LmkVCTc3Vh0WXmVqS6vksvTsD6hWVWShsrJHpUW+ysr+lHq9dTCc0AuKpm5nzyuSdYMWVkaXQ1wpYNLSwuIQ+8kydKvWGFcXAx5EQXxoMjig8sumUu2x10+/Iq9/mzfHPepS18kpnL8HQ0tdYQqC2tFcqRkNw0YxltQ2X63D0nM9NaZz9G1OXMx7o8WadMr4EZB3btMKdrhdRdbvessJQgQkZCgocsOJSI3O5OqUPDLUMlLboNj98zSPcmFLKak7WxprpsLf2NdmjPN63mcEhLUzzilLPK4SVi9TlZqG8xX1ze/n0/lsIn1Vhs913HeKH1j0GwK3pY2cM2V1quKgMeMQh3uhNsS2vU+WDLT3UMm49FuiWTb3zujcvpI773ONeD0D1Gpes1Y7Y4et0PpJMjRJLEdd8PGxhumJV7sr6QNfjx67US8SNZ2jzi/VmBCs1PhnR1lbkrW5K3q2X7rvFM1uIOl0so5QxgFJ/ld7+SXqy+nJ7YTBHOLH0g2Uc8BLNLbZkSfnYpeBI2EgiL2XQmmP39kV3Pqv2osR03nHrFZOeX+OU3TXnAQLhSElDZGXjcXJeT6I3L6oyGIVb1jAutwTZQ3rPPji0l2+llTaJYwhTBmnRe6ajfZ4f3vIEAD/Q8tyyuiI1Ao7X0nxq4DYAZl7sJBPVEYSabE/ldJs7kiNs9fQErDWhmLe/35kY5sr2IY5mtVrZOG4jgV1xKATJ0wjVvvaI6JbPVnJ1eYuv/N0t9D5RJbT3/sDUdj7/fl3/1q3Hzts4VlppfK4Rh1BixIgR4xLFReWBR55Wn1vlRzpVBe3kphxPFrfgltVLSE5JoxFDySBGCBPqBd3btpuM9ZQ/0PY4vW6xnhitGsNE6DJQ04TnX568mSMnlNGQKTRU7kDpcZU2/fvOa/fxnrZn6HBW965rdRLcnH6Vn7pGZwSfrN2OV9BjaC4aWg4RXazcadh2k7Jsvm/dM+TdEt+ctP1Bl6iSrMMB1y5PSZWM+Cuit7kibPImaM3qlH8sm6/rTke6INLkgUdt6WqvgWaGf4YkbLPgmdSE0ZJ6P98Z3MLsEZ1SZ8ucsySjVzSNcN5U80xFCH2Hak5DPGPbPfZ1a1hiOLuffjdc8loExjBQ7aonRn3bFCQ6puUoo2s+DnteU1Ih65UxfpMOT9M5vVgxUO3iy1/Vwr+OgzrQcl6Pqfd9R+shlHONxZXGAPu/sX3JSmM4vU772eKiNOBdbhZf1ID81PoHAXiqqNWPbsnFb+pv6VQNqVF9Iiu1Vv6qqOGUucuS9CenSNog3lCljeen+jg4rKXJtdEUyXHb0b2o26oLQSWg1mopeOlxtvuTK2ZNRNO6wUA4VO1ksNxmN7pyBbtaSii328/9ZX5wvSr8vTt7kIKBg2kr/LgMk2Ut8HDZ4BX4ma163j8R3Mn8rL7oTikrN9Qf8FroEFzgCV3NStvWsmByNbpT+oAdcdsXGPdzaQgXvDSaGSw1Y5t6gDfpcXBGz+GBtnX0uUcWcMabJSQOl3so2aYNucUverPAzzhniF4mHU6NPelB3FYdTzXv1o8BuGiFrbb4Y3zv3SpncU/njYDwAzfr33e3PsvVCb0P1kK1bEYUc19caQzQS3VVlcZrxUVlwJsRJRu3+RNszY7zVKtS/WrZho63lpY3ElISCm5Rf/dPh2+g0hli7FPllBxNylgDlC4KnqWguWX1LOsJrSajGK7SQkbdef5y6hb+6qXr4aiOJzPd2N+ZuN5BemGy6ub0q/acJCgETdlKMY0CJ2eh4qFTgUJB3/zHqh0cqZ1gs73ap4uZuuKwwcuxzXZ36c3N8kpey7nNpJwqCWBPj+eEuIRcyKhcELXW21Lmx697lCvTxwHoTszx5UC1UKpT7bhlqfPnz5Wu9ykFP6ahaePNCQMH9aX7F95NXLZliF63oVU+Z4ufHi9t5VPP3EbKNvBeoDx4Hl7Wi9HhJLgxdYSfvFK9x0+W30zixMXL945ogbelqnS4OuYfepf2A/DthVWdnfNDHzwXhWprRRwDjxEjRoxLFBetBx6hw4Hb8gc4sk0ZBY+XtxEmbEeRGcGfZ0FMPIovJ6YhM+RgovLxSARrgS73Ii8s6t3rANZDcs5yvj0XJKnN+6QKUfxYPX1YeaXlUnBoVM7RVFZvZKGD5pbAO6Ge3GdzN7PtqlG63TVW0S3lATaFUKrGI7ANntdaTn42iJhDqVyFHakRbk1p0UnVuDycVTGrY9k2ElOyYLZ1XvS+TeO+TE4JQUbDD6PzOU7U2tjjqwa3Ly6z9j48WOrFzHl1BUuv1Jhh1TJCpS2kr1XDAVv8MdwVSiOsFBknwVavwrakzr5SuQqB7arkzbqcKLVxqLp0H9cLCV9c9iZe+/DO4kpjUNriaiqN14qL3oC3OiluSg7Ttv5bANyXO8k/HNV2Y1NH2kgNuySnI92UBj2sod+xsqcz6nAPmsTEcqgzbvmM05TA6LrjYZGBml7Idq9A38ZxJo6va4xnhYbCLUHqiFqjz7i3sv5Gfdg7strF3q/PzcNlW6q5lUaDg+J8kmPVDgrWoK1oDPZN50m4IEzTHP4RY+ox8WLNZyZMUTY61sxZduNZKRzH1MejY2lILBQKPscrHYxa5cltiZNc1a7HPtDWQ2XWJ2EaYbjzkRwUQ0NvpWLqfP1ixV/QyX4yKHCgqiGqgUIn3ox7qvomUM1A+2UT/ORmTYpv8IqnbZ12rlDXlykI9z+9h+wNVn2y55u0X5wh8fOO6IVxLiqN14o4hBIjRowYlygueg/cF5ceN0NKdOrotDxPeYMO+z6zm2nTCpa94s8Krm1+bFhUXbkIS+lnR01iqy2GXJtmkLYlRs+oNRHanQzWPP5kRMWNHjm4De9EkkQp2sHKp+pe0RBaDYryrF8X9S9kDpIU6lRJLxkQOXMqxrX8DqrGPSX/eDq4dlueEzToiktM2SOWwsR8hiOVbgopbaKQNN45D6NUCeqt9hzHEDSFkDANeqZ/PMmnk7ew+doxAK5LHedDneq59t88xSezt1Eet4nCgoNbVjVKWBjqWqzCuFosCM3Zz4tZJKOh4W8n1H175IndZCcaNFlMQ/mxljP13pfAqvtfni2k1nReCg7DJe1POhGk4MLKpl9wnKtK47XgojfgETKO3i2X+/N022rLO1v2cf/GPdzzhFZhhUM+/nzUiGFh84eoXD5qokBTg16VaZV6xaO7ZY6f3a3xrauSQ/UK0aVQNlUGa2o5nizt4JlhLev3jyVJnZSzMwZN9LzFcKEuiZtIVqnVObtyWnaLLwHuKkKmkaH0nHBBnL0+Pvt/FM0pFJIcr7RTsBaqfeW7OiOiEFXZhEzYi2SMSv02IzrX3rxQKCQ4VtW8yU2pY+yxl7Ct5WluuukQA1Wl9R0pd3FgvodHD20FwH81VaelOqt54y2BBectkgBedA2GgyzH5tvtuB28uYUdmU7J01wgRDLOnmNfoBJyPppFx1gdLgkD7opTl+hMun5dCW2zN8NO/yGuv2MAgN87dBcjJ6zpEIOTDDBT+jZMTLj4cw2PZnE7LpxGh5MNbbN1T2ezl6hrcSyFQljlJWsMPjVwG4UhNTApS1k82wRZFHt0Z516V5YDuU76vGl8sYlCNySSqDgTRbdqXFYjR56wLmPWrRCmQruPhR61mAYvvFZ2OVnOM2i7HOelRIvV/FirJx7NcErGMFLVHEOpkCBVaui0iIHQa+QwxAvryV5XGtTJ7U6CTV6VG5JaHDWROcwzmR5OzOt2j0ytw5uP2ratzQtv9sCjl5/jhITG4XigF/hAeRMjc3rP+LMLdWeCRMOpqOZC1qdnbOcdWG33ndWgMcsJCZtzDCVhqqzXdzTIMxdqvmM5bZcY5x9xDDxGjBgxLlFcEh74csg4CbaLR8ZSdnbu+RyHdmi8+MXiBr59cgcD81pAIYsVB5u9UYFaGoKMeh4dqXl6XBtzX8EpCuxcOQgdnEgdsLSQFbMAxjZKWMIjjqSMo+Kk1ITwyBPa2zF9U5X3djzNiGW6VKsebtlO96u2mMd6jEYW8goLQZJKNE6zdCl3M5J2Q21+Acmoxx8kPdzSws5CddXGWY8Xx9dxX+YKAFpbniJpZwprZaREXX4Ga2kGy3rs4byPU1mmyYQLfrJWDzUtPlIHwbHfZkTY7o/zznVaMffJkx2Ew2tr3ru4ICxICGFSB5pJVMm7RQaqWqH7qYHbmHlRZ1iZwsJQSehDqVN/d+uNL/ODXY+fdeed1SA6b0m/xnyiIdHplYTDIzrbfLh9FzcmtXgmvYL7Kcb5wYoNuIi4wBPACWPMe0RkK/B5oBN4EvhRY8xyPdLPG6LKQYC8U6RktKnDk+FWjgx14k/qHe/NqQh/8wMfKf4FaSh3GDZu0wTc+3ufos+KwDtnyNRkHJ8rbLjlhzY/yScmtDrLjKZPnX5HtLIa+EVD9oTNcIaG4jqdhoa+EPhgKpFhNlRa9OF4dqwPgH2T+lLK/UMOvxjWjyX0GqEUMaaegDIll/FqlolQ91E0FXJy+mlvylqSnsQsaVvTHSZSSNjg2hsHfHtC/RmHyekshzr1BTqYzbPO1e48a+3eUjJ6Igeq63nwqEoq+BOWbhcZvKZKReMYXLcphLLENpNiawnExaXMhsQEoHkF4zUM+IIKV9PYRzUr1NLUG0xLUxLVLSqNMzLkQRKqXWoUd7SN4RMwaDV5puYy9bxNXfck2p0HQU53viUzvubOOyuFaw846deYjSyEAacMtSb93Xkb7umMbfcFw2pO/b8D9jX9/d+B3zXG7AAmgQ+fy4HFiBEjRozTY0UeuIhsAO4GfgP4BRER4K3AB+0qfwp8DPiD8zDGM6IuBlR1+OKkMlL+bt/VuINJElOWBlVcmJAyTsN7qrQa/F0z3LVOm8RekzxOp9ViOXOowWe9pdltSIyTyuhYgkR6YdFLDTzLkkhOB2RfOgkVm6l0HLwJ9YhLG1vxZyoYT/c7vyFN7aR+Hhvo4IHnu+j/lv4uPzINNfWWqt0ZZjYlG8lM0zhep+Dy5NhGrstpA4Dd/qucyY/L2mRvq1sg4emGSgJeCXIn9BgT40VmtyutLPRd5pJpnspo0+G3tLVzuT99hr2sDFGl4rFqByWr6JiabzB8ItRZHz6kElVSTpWl0HxNXXQWtd3XWVRrtshYRo8pSAteYel91DJgrpvhzRtVp8aRkHv3qUok077OfiJxtHRA3wb18HdmT1LFrVNDS3OJU0SrIlGuao66nrxqui97is4popCgMbIg1OgE2kgcoBAmmLWSmSEm5qNcIKw0hPJ7wC8Ceft3JzBljIkIT8eB/rUOpmqnytNhqT4zjh7e6OYtGep0OKf+O/3/3vmr+daJnQC4x1NK47NRiogSFs0AjajqIEA1b1iXn2d3SkvNN3hnz5xwXRvSWOKOTsza6s5XpyEMMdaASz5L0KovjNTzGs+XtBr0fLlGJdei47rXkBwvLSATz+/QZcV2V18WzVKvNqDlzQszxRSDFZ22T6cPn7GnYcZy69f503RklfkwVO2g9VCJxGHb+9FxaKlGQfA2qnmX2Rk9jpFaK6VzXOFYDn1MwTZbXqJX54J+lqug/0QKjAA/s/VBPhHcCcD8TJcqVTbvIgqh5Aw7Oif4vk5VirzMH+OHO1VIabjWykyYrlezZp0yfX6jX+sThW3876dV/jQ1kFwoWgUENrpV2lzhw9eorPLN6VdpfY06wYf2yZorJfFKDVquBMCo8uef7e1ntj1h1w+IKYUXBmc04CLyHuCkMeZJEblztTsQkY8AHwHY1L/87gphhRGrtHeg2s635y4DoNUrUA59jhQ10bM5Pc6hgnovk+UMKa/KxrQ+HM9MbmDypL5jslOCVzQLOLVGGlSzIKWGG8C0V3nbupe5PKkGPCWry+1GmhSbvAk6Mvo0Hsu3Us07+HO6jzAF01t0u9NbukhNhAQJ/V1iLiR/UJOmkkhAGBK2qY88uy1XN0yzGzxmNzTGNncrBBO6jfQJITm18Hijz25RmJ9OM17NAtSTmadD2iYet3jj9GeVLjY53o9brDbIzJ5LYYu+QGopfVmaeR3fWDXPhH1D9pnqaamYp8N0WORITfdxrNSOU9Cxu1ECs+lQ6q3eKjBXSFEKV7bP5jzKtsRJum3Ho+lc50IFxiZL7s0LI3M5hm0y8urEGLenopM/TtnUqKIvt7IJ68M8WE1xuNgNczYGP9tUm2ALfqJEdjJXZkdSX5ZbPXfNnXfWAjEGpyo4Nmk+X0kwFSqVMTAz510pMcbSWImlug34XhF5N5ACWoD/AbSJiGe98A3AiaV+bIz5Y+CPAW64OnU+ZINixIgR4w2JMxpwY8wvAb8EYD3w/2CM+RER+QLwAygT5UPAl1a786jCbigocLDawhcntRfgfQO7cWyVXT5doha4TE6r9xiOJ5CousCAUxWeimK9VchEJdHzC5sPRN53RVlolDsM9Ku3fMOmY1yZPs46V3+QPANDYzGijiYbvAI/vVkbIfxBeAejhV6M2+htWd6q8Zy37H6FrZkxXprVLi2PvrKNju+ol9n9lEOlNUGpS73HSl4odtu446Ya23YM833rnwVgOkjztUEV9hoKu/GKbj28ov0qreZ0SWDWY7SiXmbBnPm9HYWQMk6N9akZAB7ebUiPZ0nn1BMsdvvUUpaamFQP2J/Sc/HNE7u0JyjQ5+5nvXd2HvhEEPBPM6rlfe/Le0hO22bLtYXrmaZydQmEIHCYtaybpSPhS8MlrFcbLuVVRowUf157Vn7K03v2st2DdNv7pxBWORY4DFiRquFaG4dKPQDM15JMVZtoik37MAK1vNRnhl35Aht97Q17rpUHTwfHnshMssJUMgo9ijY9sZGShBvgExVKxe73hcJaeOD/Efi8iPw68DTw6dVuIKqwGwkS/OaRd7F/v4bRE2MuWBrdTK0VJ4CMNcZuuTHFNI4mVqJYrzarXTrpFPqiXb17LC3rqkFu6dIEVN4tsc0fq8d9V4vI2G3wcuxMDAOwrXWME11dhL5tPLt7ih/Z9jQA78k/yw7fcKRF//64+3bun7QNB3J5EtOm/qCUOqQ+5s3bTvLB/sd4h1UlPF5L81xOz9lgpoNqrtHsQsKGUqBqfTjMVNRwlFZgwCO0ObA3o3H5v994JaPX5UkPq2FMzC5sKOFUVOIXYOx4G0/3aBOOWzOHWL/iPTZQCCscqbVwcF5DZkz5dalVp2YWJImN08hphJ4hDKWu53I+WG5OTe+9clXP5VSY4UhNje3hagefG72JB/ft0nWnPMJWfePkOgr4boA32/QiapIxrmah7Urdzs9sfZDNNjb/WigPngnGAaxj5btBnW4Y48JhVQbcGHM/cL/9fBh407kfUowYMWLEWAkuaCVmaKdqs2GKQjWBaxNUqQnBn9U3fWLOkBqrkhjX8MP8lhyVnJ22p6zuRTTjbXIIIq1oY5OW1aw2CO7YqXSuu3r284788wDs8KPQydqTRJHX50sIyQBbNEl/6zS76iyXGr4k6LbCz/+i67twq673wENXgnHqnmWl3XDTjUpv/EDP41yXHKbN0ctWdYvsbdHUw77OXirDfj1kQ23heZFw9e3hAHyEdZ4mMfeuH+S749vxZzQc4lYaiVJtIWbw53QfyZMuL06o3/18az8dzsI+kKdDFFobCSo8Mn8V+0a1cCkx5dSLZWRRD0po0EKDjKG1pUC3p6GfxBmm+IEJGbK9TA9VtjBa0HCdW5IFTTOaC3mCJFTWVXlzj57/vFNisKYJ9N88/C6OvbCO/AmbcC1DkLCMDT9BBUhFDNKm+I4Kqpl6EnVb4uRrpjrYjCjlWqz4jZ6Yos9QkNNnpTs9R7er06EzFbvFOH+4oAY8Kmfu82a5qXuAv11vy6RHMviWf9uyfxb35GSd+ZAvVZjfrnS4Qo8LboPaVctKYxrt6pQ6tKXA1ZzB7S6xt0uF/W/KHmS3r4Yi56ytdLoZHTaesys7wnPdffXvb+08zBU2vNLqpHAQelw1FNv9SbZnNF58f2+ZYpiqS7j6m+a5u0uF429NjdDTVInnUOGW7AEAnujYzAuplkYHokWQKsyUNfQxXGtjLBiifQVc94zj02crKrsS87i5GkFabxszK5gmtbyoyTRAYloYeUYN7+9U3kZ2z1e5LqnH3+sml2WlNKs7fre0kb89ejXFg3pfZKalkddYIh1eN+AdVW5dP8Bl9nznVsGA+cfxqxi2gmhJxxC6Uo+3N/dNDRLgpgLyvjoWLiEvlTWcNTTZQnLCqTshzVWalwIi9k4QOAsEuWoZQ99mDe+8r/sZ+mzM33kN5VNjLMQFNeBRF/o+1+UH2x9jfqdyTL82vZf8UTUqtbYkeJ315NzknnxdtztICEGykVipZQ3lDerS7Nk6yPBsvl6Esqv9JO/ueL7+UPd5tVUnK1eCDlcH8/0tz3DH5S83fV+i16r5+Yti7XlH2JHScXV3zuJ2T3N5u9LH3tn+PFdZeuNiQ5RxEnRadbr2ZIEwaRovsKrg1BOaBq8oHDmiseS/yV7P5f1fpdXGM08X+fdwWefqOf1A53c5uqGd5wsbdaHxSY/YpHHFaD4ikk01pq78one4pQAAD6tJREFUOHOgnf/K3fyXPV8F4M2pEZJu41gKYYUJW4x1uJbj0XlNzH7mpVsIjmZJjVt9mXKDz78UzTs69kxLiVtaDtblEJKy8hf03V3PcXSDGvDRoV57MNT/j/brViAou0zbhORwrY0DRV0/CBwct+G5y2JJhcWoc/eX0c55jRCYkIKp1jsG1WoO9vEh9KDaFrK1RWew2/2TK3IAYpxfxGc+RowYMS5RXBRqhDknxW6/yEe77wfgtrce4Kt79wLwnSd241QT5HdqHHZqpNZoOJwIQCC0MVlSIds2aUn0h/sfYrs/qrFoICMB3a7XpF2cPC/H0mq9klaHRR1Lssv+Jic+19v+jb+5+4scrXaQtaGYG1OD9J8mdhwdX1diDmmvEIxYr6gEREWSoRaMOLPqaw8VWjhWa2WbH2lLL++Du+LUdb0v9+f5uf5v8BlPuw49VthF9oQtuR8o41RCCuv0vFYDp4kR5DDttnNPl8oclDqfrc+EMk6NiSDFN+euB+DekcsYsDMF/6RPelKwRAzLMlo4vjoLxhMCW2aQT1XodmfqlYtn8hCbC3kuSwyxKa+FYYMtPSRmBSc4VUPeqYA/7PNsl4bJ3tH+Am9tVUXDfevXsW+2H+yMyZujfgzuaUTZxWiB0NFJnQF8u/MysqLhs163QvdpQk/nAiGG6dCwr6jHVC36eHa4oQfGM6TtbCzjVHHPwww2xupwURhwUMO3106B9/gnuSr5NQCqfV9nNkwxHljFwSuKVK3BmQ+TDFbb+eaYVm1e23aMt+RUb2uHP8N6N3NJTO8yToIttvqz1Zkl75Qay86QgOuwgiebkhO05IvMdNhWYWWnniBzAptgtNS1sbkss2GaQqhaJZ7jnvY8Rcu63CwtTokwaghcFLqe0YSbNzIFIuQKmsibvixPGDQ4+ZlBh8e/qVohj27eysZeNZLrsjMcm21jaEQrGmU8QcKq8yVmRDVsast3pYkMeKUFqr360ruic5h13uxZJdd8CelJacxfOsrUJtL18yhBYwxOTatea6EOwJGQnZaz/Wtb/o4n123hyyNXA7D/O1vIDNkw0GlI6doxCiq2w/mnT34PD+7cAcDPb76Xm5KTC0JP5xoFU+FYrYX9M1aCec6rH28tA25rhQ0pvW5tTm25zcR4DXHxW7cYMWLEiLEkLhoPvBm+uOxNNE/rQwIzdcp6ZTPDnBnkgy0v1r+LinFyzmtPv1oL6pWP4tf7Nyr803rHUZjg9swrTG9N8+kJrQwMJpN4hYgyoUU9kbDX/GyKF4ob6snRjBPUW9adCXmnwknrZbcchDBpGSmpJGE+pdotaII5Ymy4VYMzBZ6lGAYTaUaTGuo56fUhNchGTI+mphtuRZtiLNcfNPRUlxug3Bly/Q5VW/xQ98Ns9swpyeKVoNupcVXmOABP9Wzk2JyPsbTNxIzUi8ZUSyekK6X0kpRU2WC944yfYKN3gISlr/zXvvVUiil7fIK3WOVrgcaKqRdjGddloF37eh5a38PlifFz2mt0Mcom5EStnUMTWkEaFRsBVFtD7tqxn7fkNUzU4VyUpuMNh0vmKixlxDKSWHOzgIsNqxUsitbvdefYmhylo0un/zOjiTonW2xvTi9SZhxNcO/QZVybGQCgwxld8dQ87wR8dLM2fP7zf3kzx/9GmwFLmCb0G42FI1phBDGN+K9bBWYXbtcsEylaTlQwdKGWFkrdusLWa07wvd3PALDbn6H1LF/grU6Ct9tK18t2DvLUhq189vBNAEycaEMsL9qkQtIdRfZYttAWfwK/KT6dE5+bUgMAfOxNX+b32u4ClJGTmHTwrAKhW1wiLl5npQi1wCoDBqlV9TRdDSI55mM1nxcL/fyf9s41Nq7iCsDfubt317vx5mE7fmA7DxInIe+YiIRCEx5pS/iTVvyBtiqVkFAlqFqJ/qBFlfqrv0orUQFVUSmUVqVC0JIKRIGoApUQIAkhIUEhTuISbOfhvMzaXnt9d/pjZtcbxy5xZO96veeTrnZ37t17Z8/OPXfmzJlzkqes7GK9kjNRBYmA+RXnaHaZqkqtgzRdKRkFrvx/El6YZv8sjTPt4pXztQlSA1a5x08JMmjwnC9ytNuj47/VfNiwALDKJ+EMvV82STYvXEkibidcm+e9wpEH6gH43bFNnD5aTbTb9nojF/OU+RUkd77S6K8Z16kOKoRUraFiibXj39Wwl5ti7QDUhK7erz/uRZjnHoqNoQzzwwfZsNIq9M5lc7gQ2AllXwLmhnpodIuF6kKZS/yh416EJb69vRJeO8tW/RmAQ0saeevCUt7abV0lK7pCRFz4gXDKZozKukMGFYZo1GXyiZ4i4U1OzJGsC+ebyXW8dGRNLp6NF0B6hvPrnzXArHAfvoY9mVKoDVxRFKVE0R74NCEqPi3hJD9pfg2AX3EHB1I2mJTfGyZycdibI5wSwhfCvNZpvUKaIueoDx0FoPYKTClznFvjRi/DUt8GBGtZdpIX6m7glcM2KNfQidhwNqQ+2xu/LEcojJrgOWtOyfiA5CXhCEEQtR/6aw2Z5hTr6q29ujXWTl1oOM/lRJB1L8yFYI8mgeSIo8bu7ecHOcueY6X/ORsq2rl1i/WW+uX+rQwcsnMK5oIQShmGnHde0JTiB0tt4uBV0a5xrSgdD92BPW9bXy0D52JUuhjzgQ+D1fZPW1xznmXRThKeJm6YSqgCnyb4EqIhXEncs8bVBxt38AS3AfBxciGhAY+QmzzzBiF6XjjZZafEPprbzMaYVcS147g/Q+JR48IBxGWQquq3aW1tB+C5uhtp+9TGQol9HsYLhCBvlWYu2cQg+L0Z4l3DrpO9jVaDDcU8hiog4xJfBBFI26i7pOvTXD/vBN+daxXcgvDgpKysnWiyppWE1w5AetUbPBHbBED/ezVIRkjPtP9T1exeFrvwA00hf9ISOgy6gfjFdAVev0fglkgMzs5Qt8CuvHxo/uss9y8SH8eqVmXyUROKoihKiaI98GlGdiXo8sh5bqmxUQyPLa4ilZpNxZlsTBFDuE+IdNqh88F5DRydZV3HmsJnc+cYD3EvwrW+EJF2AE7VzaI/bc/fOWM2/b1hQr0uFkyPR0W3+97pIRIHhhM8m1iUGW7x0unWOKm5hnTCurNk4gFe3HbdF9SfY8Oc4yyL2IUl1V5pLNqCS1d+fiV2jAPX2ETQ21sqSfX4yGw7qbhoTjfzwvb3+eNM8zceZjvfyIUzzrKvvolUtR2GbV5yhHtqdgG2PdWWyMK4ckKMmSTfpFFYv6bCvP+v5oJd78s470KIHhu69OaYGxpkXhHCeE4kAyZNW9oqu5d71vLUrk3Ej1uFGrlog0INzrSKMrk4zYaV1gb+s8ZXuc73r8qOHJgMSWNdXbqDgDOB8/UOElwI4nSlrcnmb8db6d9jHxi1e4aIfJHOZdNJNkU5u9plMWoYZP3idjZXfQpAdShJ3LnSVIeS1If6cnbv4RAJpcX5oI8TzrbUOTSL00MJqsPWzt4YupgLyFXtxSZNeWbvg5MBnMsMy7HKS1HvmkF23kMpDqGGtj3GmPUjy8u2B/7ZUJKd/fZh8vzJGzi8YxHeGuuSdt/Sd9lSaRcsXK0yKzZR8alz8caXxzpYuriTI73294b7PUK95BIux0747J1p9+2cs4jEjMNcE47mznOlhMRjlgzHglmU+2ofgUnSFbQDcOPKI7y7sAWAf964io536hm8ziqR2xd/xLrKz2y9KzpoDCXHUNIeUNoPWbCKcY5rXqsjA0B+3NkokxWzZ2Qd7OvIPaq0pzo6HlIURSlRyqoHnjYBn6StrfXN5Gp++8GtAMx/waOONMlO6+LweN9mtnz1UNHqOVHEXe+5xT/DilldtNXaxLoDPTEkkNxCGz8JwTHb23rUfI3UGp9b4tZ+vtDvvyqb+Ejy7b5NYVjl28TMWyoPwtLh4y43X5V+L1tRJouyUuD5PP7hZqrfybplWTvj2ZusyeGH694qUq0mlqzppy6U5ttVu8AGbWS7rCadiuei7IX7TW6Cc2AozmPp2+hYZe3V36/aiR+2cplIN7bssP36y4bt0ys0gqJMJmpCURRFKVHKqgeeH+Xw6O1/ZEX8OwDU/jzNff94jbsqe/KOLk2vhnyyPfCa0AxqQnB9w14AeoZi7N65Jrf6UTLg97pAUwOARPlP7bUAbKw8yopIz2XnVhSl+BTUjVBEzgC9QHfBLloa1KAyGQ2Vy+ioXEZnOstlvjFm7sjCgipwABHZPZo/YzmjMhkdlcvoqFxGpxzlojZwRVGUEkUVuKIoSolSDAX++yJcc6qjMhkdlcvoqFxGp+zkUnAbuKIoijIxqAlFURSlRCmYAheRO0TksIi0icjDhbruVERE2kXkgIjsE5HdrqxKRN4QkSPudTITkE8JRORpETktIh/nlY0qB7E85trPfhFpLV7NJ48xZPILEelw7WWfiNyZt++nTiaHReQbxan15CMizSLybxE5JCIHReRHrrys20tBFLiIhIDHga3AcuAeEVleiGtPYW41xqzNc3t6GNhhjGkBdrjP051ngDtGlI0lh61Ai9vuB54sUB0LzTNcLhOA37j2stYY8yqAu4fuBla47zzh7rXpyBDwkDFmObAReMD9/rJuL4Xqgd8AtBljjhljBoHngW0FunapsA141r1/FvhmEetSEIwxbwPnRhSPJYdtwJ+MZRcwW0QaClPTwjGGTMZiG/C8MWbAGHMcaMPea9MOY0yXMWave/8F8AnQSJm3l0Ip8EbgRN7nz11ZuWKA10Vkj4jc78rqjDFd7v1JoK44VSs6Y8mh3NvQg84U8HSeea0sZSIiC4B1wHuUeXvRSczicLMxphU7zHtARDbl7zTWNajs3YNUDjmeBBYBa4Eu4NHiVqd4iEgl8CLwY2PMJUF6yrG9FEqBdwD5udSaXFlZYozpcK+ngb9jh72nskM893q6eDUsKmPJoWzbkDHmlDEmMMZkgKcYNpOUlUxExMcq778YY15yxWXdXgqlwD8AWkRkoYhEsBMv2wt07SmFiMwQkUT2PfB14GOsPO51h90LvFycGhadseSwHfie8y7YCFzMGzpPa0bYbr+FbS9gZXK3iERFZCF2wu79QtevEIiIAH8APjHG/DpvV3m3F2NMQTbgTuBT4CjwSKGuO9U24FrgI7cdzMoCqMbOoh8B3gSqil3XAsjir1iTQBpro7xvLDkAgvVkOgocANYXu/4FlMlz7jfvxyqmhrzjH3EyOQxsLXb9J1EuN2PNI/uBfW67s9zbi67EVBRFKVF0ElNRFKVEUQWuKIpSoqgCVxRFKVFUgSuKopQoqsAVRVFKFFXgiqIoJYoqcEVRlBJFFbiiKEqJ8j+A8VZ6MGEB0wAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.imshow(batch_x[0][:,:,0])\n", + "decoded = ''.join([decode_maps[i] for i in decoded[:,0] if i not in [0,1,2]])\n", + "actual = ''.join([decode_maps[i] for i in batch_y[0] if i not in [0,1,2]])\n", + "plt.title('predict: %s, actual: %s'%(decoded, actual))\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..ad4ef42 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,8 @@ +tensorflow +numpy +scipy +sklearn +scikit-learn +matplotlib +seaborn +pandas diff --git a/speech-to-text/1.tacotron.ipynb b/speech-to-text/1.tacotron.ipynb new file mode 100644 index 0000000..150f5f0 --- /dev/null +++ b/speech-to-text/1.tacotron.ipynb @@ -0,0 +1,956 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "dimension = 400\n", + "vocab = \"ES abcdefghijklmnopqrstuvwxyz'\"\n", + "char2idx = {char: idx for idx, char in enumerate(vocab)}\n", + "idx2char = {idx: char for idx, char in enumerate(vocab)}\n", + "\n", + "def text2idx(text):\n", + " text = re.sub(r'[^a-z ]', '', text.lower()).strip() + 'S'\n", + " converted = [char2idx[char] for char in text]\n", + " return text, converted" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "train_X, train_Y = [], []\n", + "text_files = [f for f in os.listdir('spectrogram-train') if f.endswith('.npy')]\n", + "for fpath in text_files:\n", + " try:\n", + " splitted = fpath.split('-')\n", + " if len(splitted) == 2:\n", + " splitted[1] = splitted[1].split('.')[1]\n", + " fpath = splitted[0] + '.' + splitted[1]\n", + " with open('data/' + fpath.replace('npy', 'txt')) as fopen:\n", + " text, converted = text2idx(fopen.read())\n", + " w = np.load('spectrogram-train/' + fpath)\n", + " if w.shape[1] != dimension:\n", + " continue\n", + " train_X.append(w)\n", + " train_Y.append(converted)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "test_X, test_Y = [], []\n", + "text_files = [f for f in os.listdir('spectrogram-test') if f.endswith('.npy')]\n", + "for fpath in text_files:\n", + " with open('data/' + fpath.replace('npy', 'txt')) as fopen:\n", + " text, converted = text2idx(fopen.read())\n", + " w = np.load('spectrogram-test/' + fpath)\n", + " if w.shape[1] != dimension:\n", + " continue\n", + " test_X.append(w)\n", + " test_Y.append(converted)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(13128, 560)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(train_X), len(test_X)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "embed_size = 256\n", + "encoder_num_banks = 16\n", + "decoder_num_banks = 8\n", + "num_highway_blocks = 4\n", + "learning_rate = 1e-4\n", + "\n", + "def embed(inputs, vocab_size, dimension, scope = 'embedding', reuse = None):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " lookup_table = tf.get_variable(\n", + " 'lookup_table',\n", + " dtype = tf.float32,\n", + " shape = [vocab_size, dimension],\n", + " initializer = tf.truncated_normal_initializer(\n", + " mean = 0.0, stddev = 0.01\n", + " ),\n", + " )\n", + " lookup_table = tf.concat(\n", + " (tf.zeros(shape = [1, dimension]), lookup_table[1:, :]), 0\n", + " )\n", + " return tf.nn.embedding_lookup(lookup_table, inputs)\n", + "\n", + "\n", + "def normalize_in(inputs, activation_fn = None, scope = 'normalize_in'):\n", + " with tf.variable_scope(scope):\n", + " batch, steps, channels = inputs.get_shape().as_list()\n", + " var_shape = [channels]\n", + " mu, sigma_sq = tf.nn.moments(inputs, [1], keep_dims = True)\n", + " shift = tf.Variable(tf.zeros(var_shape))\n", + " scale = tf.Variable(tf.ones(var_shape))\n", + " epsilon = 1e-8\n", + " normalized = (inputs - mu) / (sigma_sq + epsilon) ** (0.5)\n", + " outputs = scale * normalized + shift\n", + " if activation_fn:\n", + " outputs = activation_fn(outputs)\n", + " return outputs\n", + "\n", + "\n", + "def conv1d(\n", + " inputs,\n", + " filters = None,\n", + " size = 1,\n", + " rate = 1,\n", + " padding = 'SAME',\n", + " use_bias = False,\n", + " activation_fn = None,\n", + " scope = 'conv1d',\n", + " reuse = None,\n", + "):\n", + " with tf.variable_scope(scope):\n", + " if padding.lower() == 'causal':\n", + " pad_len = (size - 1) * rate\n", + " inputs = tf.pad(inputs, [[0, 0], [pad_len, 0], [0, 0]])\n", + " padding = 'valid'\n", + " if filters is None:\n", + " filters = inputs.get_shape().as_list()[-1]\n", + " params = {\n", + " 'inputs': inputs,\n", + " 'filters': filters,\n", + " 'kernel_size': size,\n", + " 'dilation_rate': rate,\n", + " 'padding': padding,\n", + " 'activation': activation_fn,\n", + " 'use_bias': use_bias,\n", + " 'reuse': reuse,\n", + " }\n", + " outputs = tf.layers.conv1d(**params)\n", + " return outputs\n", + "\n", + "\n", + "def conv1d_banks(\n", + " inputs, K = 16, is_training = True, scope = 'conv1d_banks', reuse = None\n", + "):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " outputs = conv1d(inputs, embed_size // 2, 1)\n", + " outputs = normalize_in(outputs, tf.nn.relu)\n", + " for k in range(2, K + 1):\n", + " with tf.variable_scope('num_%d' % (k)):\n", + " output = conv1d(inputs, embed_size // 2, k)\n", + " output = normalize_in(output, tf.nn.relu)\n", + " outputs = tf.concat((outputs, output), -1)\n", + " return outputs\n", + "\n", + "\n", + "def gru(inputs, units = None, bidirection = False, scope = 'gru', reuse = None):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " if units is None:\n", + " units = inputs.get_shape().as_list()[-1]\n", + " cell = tf.contrib.rnn.GRUCell(units)\n", + " if bidirection:\n", + " cell_bw = tf.contrib.rnn.GRUCell(units)\n", + " outputs, _ = tf.nn.bidirectional_dynamic_rnn(\n", + " cell, cell_bw, inputs, dtype = tf.float32\n", + " )\n", + " return tf.concat(outputs, 2)\n", + " else:\n", + " outputs, _ = tf.nn.dynamic_rnn(cell, inputs, dtype = tf.float32)\n", + " return outputs\n", + "\n", + "\n", + "def attention_decoder(\n", + " inputs, memory, units = None, scope = 'attention_decoder', reuse = None\n", + "):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " if units is None:\n", + " units = inputs.get_shape().as_list()[-1]\n", + " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(\n", + " units, memory\n", + " )\n", + " decoder_cell = tf.contrib.rnn.GRUCell(units)\n", + " cell_with_attention = tf.contrib.seq2seq.AttentionWrapper(\n", + " decoder_cell, attention_mechanism, units\n", + " )\n", + " outputs, _ = tf.nn.dynamic_rnn(\n", + " cell_with_attention, inputs, dtype = tf.float32\n", + " )\n", + " return outputs\n", + "\n", + "\n", + "def prenet(inputs, is_training = True, scope = 'prenet', reuse = None):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " outputs = tf.layers.dense(\n", + " inputs, units = embed_size, activation = tf.nn.relu, name = 'dense1'\n", + " )\n", + " outputs = tf.nn.dropout(\n", + " outputs,\n", + " keep_prob = 0.5 if is_training == True else 1.0,\n", + " name = 'dropout1',\n", + " )\n", + " outputs = tf.layers.dense(\n", + " outputs,\n", + " units = embed_size // 2,\n", + " activation = tf.nn.relu,\n", + " name = 'dense2',\n", + " )\n", + " outputs = tf.nn.dropout(\n", + " outputs,\n", + " keep_prob = 0.5 if is_training == True else 1.0,\n", + " name = 'dropout2',\n", + " )\n", + " return outputs\n", + "\n", + "\n", + "def highwaynet(inputs, units = None, scope = 'highwaynet', reuse = None):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " if units is None:\n", + " units = inputs.get_shape().as_list()[-1]\n", + " H = tf.layers.dense(\n", + " inputs, units = units, activation = tf.nn.relu, name = 'dense1'\n", + " )\n", + " T = tf.layers.dense(\n", + " inputs, units = units, activation = tf.nn.sigmoid, name = 'dense2'\n", + " )\n", + " C = 1.0 - T\n", + " return H * T + inputs * C\n", + "\n", + "\n", + "def shift_by_one(inputs):\n", + " return tf.concat((tf.zeros_like(inputs[:, :1]), inputs[:, :-1]), 1)\n", + "\n", + "def encode(inputs, is_training = True, scope = 'encoder', reuse = None):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " prenet_out = prenet(inputs, scope = 'prenet', is_training = is_training)\n", + " enc = conv1d_banks(\n", + " prenet_out, K = encoder_num_banks, is_training = is_training\n", + " )\n", + " enc = tf.layers.max_pooling1d(enc, 2, 1, padding = 'same')\n", + " enc = conv1d(enc, embed_size // 2, 3, scope = 'conv1d_1')\n", + " enc = normalize_in(enc, activation_fn = tf.nn.relu)\n", + " enc = conv1d(enc, embed_size // 2, 3, scope = 'conv1d_2')\n", + " enc = normalize_in(enc, activation_fn = tf.nn.relu)\n", + " enc += prenet_out\n", + " for i in range(num_highway_blocks):\n", + " enc = highwaynet(\n", + " enc, units = embed_size // 2, scope = 'highwaynet_%d' % (i)\n", + " )\n", + " memory = gru(enc, embed_size // 2, True)\n", + " return memory\n", + "\n", + "\n", + "def decode(\n", + " inputs, memory, is_training = True, scope = 'decoder_layers', reuse = None\n", + "):\n", + " with tf.variable_scope(scope, reuse = reuse):\n", + " dec = prenet(inputs, is_training = is_training)\n", + " dec = attention_decoder(dec, memory, embed_size)\n", + " dec += gru(dec, embed_size, False, scope = 'gru1')\n", + " dec += gru(dec, embed_size, False, scope = 'gru2')\n", + " return tf.layers.dense(dec, len(char2idx))\n", + "\n", + "\n", + "class Model:\n", + " def __init__(self, is_training = True):\n", + " self.X = tf.placeholder(\n", + " tf.float32, shape = (None, None, dimension)\n", + " )\n", + " self.Y = tf.placeholder(tf.int32, shape = (None, None))\n", + " self.Y_seq_len = tf.count_nonzero(self.Y, 1, dtype=tf.int32)\n", + " self.decoder_inputs = embed(\n", + " shift_by_one(self.Y), len(char2idx), embed_size\n", + " )\n", + " with tf.variable_scope('net'):\n", + " self.memory = encode(self.X, is_training = is_training)\n", + " self.outputs = decode(\n", + " self.decoder_inputs, self.memory, is_training = is_training\n", + " )\n", + " self.logprobs = tf.log(tf.nn.softmax(self.outputs) + 1e-10)\n", + " self.preds = tf.argmax(self.outputs, axis = -1)\n", + " correct_pred = tf.equal(tf.cast(self.preds, tf.int32), self.Y)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))\n", + " \n", + " masks = tf.sequence_mask(\n", + " self.Y_seq_len,\n", + " tf.reduce_max(self.Y_seq_len),\n", + " dtype = tf.float32,\n", + " )\n", + " self.cost = tf.contrib.seq2seq.sequence_loss(\n", + " logits = self.outputs, targets = self.Y, weights = masks\n", + " )\n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize(\n", + " self.cost\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING: Logging before flag parsing goes to stderr.\n", + "W0830 13:12:56.280405 140415593760576 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:507: calling count_nonzero (from tensorflow.python.ops.math_ops) with axis is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "reduction_indices is deprecated, use axis instead\n", + "W0830 13:12:56.310420 140415593760576 deprecation.py:323] From :122: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 13:12:56.314230 140415593760576 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 13:12:56.590281 140415593760576 deprecation.py:506] From :127: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "W0830 13:12:56.644309 140415593760576 deprecation.py:323] From :66: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "W0830 13:12:57.293771 140415593760576 deprecation.py:323] From :166: max_pooling1d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.MaxPooling1D instead.\n", + "W0830 13:12:58.725348 140415593760576 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 13:12:58.727220 140415593760576 deprecation.py:323] From :88: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 13:12:58.728699 140415593760576 deprecation.py:323] From :92: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 13:12:58.729964 140415593760576 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 13:12:58.806982 140415593760576 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 13:12:58.821044 140415593760576 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 13:13:02.284158 140415593760576 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py:199: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(13128, 48, 400)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")\n", + "train_X.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(560, 48, 400)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")\n", + "test_X.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "epoch = 20\n", + "batch_size = 64" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:41<00:00, 4.25it/s, accuracy=0.783, cost=0.659]\n", + "minibatch loop: 100%|██████████| 9/9 [00:01<00:00, 6.04it/s, accuracy=0.7, cost=0.684] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:36, 5.64it/s, accuracy=0.73, cost=0.683]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 1, training avg loss 1.272585, training avg acc 0.584240\n", + "epoch 1, testing avg loss 0.683997, testing avg acc 0.721852\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.56it/s, accuracy=0.803, cost=0.525]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.72it/s, accuracy=0.706, cost=0.553]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:36, 5.62it/s, accuracy=0.741, cost=0.552]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 2, training avg loss 0.612984, training avg acc 0.723206\n", + "epoch 2, testing avg loss 0.565084, testing avg acc 0.731944\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.40it/s, accuracy=0.803, cost=0.515]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.57it/s, accuracy=0.716, cost=0.517]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.30it/s, accuracy=0.753, cost=0.524]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 3, training avg loss 0.542716, training avg acc 0.735273\n", + "epoch 3, testing avg loss 0.527480, testing avg acc 0.740622\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:39<00:00, 5.35it/s, accuracy=0.829, cost=0.478]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.71it/s, accuracy=0.723, cost=0.497]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.36it/s, accuracy=0.761, cost=0.494]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 4, training avg loss 0.510864, training avg acc 0.743509\n", + "epoch 4, testing avg loss 0.505699, testing avg acc 0.745512\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.43it/s, accuracy=0.803, cost=0.478]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.91it/s, accuracy=0.729, cost=0.47] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:37, 5.53it/s, accuracy=0.767, cost=0.467]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 5, training avg loss 0.485896, training avg acc 0.750599\n", + "epoch 5, testing avg loss 0.482513, testing avg acc 0.752478\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.35it/s, accuracy=0.829, cost=0.447]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.37it/s, accuracy=0.742, cost=0.449]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:36, 5.54it/s, accuracy=0.768, cost=0.447]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 6, training avg loss 0.462700, training avg acc 0.755660\n", + "epoch 6, testing avg loss 0.461965, testing avg acc 0.756211\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.29it/s, accuracy=0.822, cost=0.408]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.07it/s, accuracy=0.737, cost=0.44] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:39, 5.18it/s, accuracy=0.776, cost=0.422]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 7, training avg loss 0.441943, training avg acc 0.761259\n", + "epoch 7, testing avg loss 0.445603, testing avg acc 0.761199\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.43it/s, accuracy=0.849, cost=0.379]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.29it/s, accuracy=0.748, cost=0.41] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.28it/s, accuracy=0.787, cost=0.4]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 8, training avg loss 0.419698, training avg acc 0.767303\n", + "epoch 8, testing avg loss 0.426653, testing avg acc 0.765520\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.64it/s, accuracy=0.842, cost=0.365]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.72it/s, accuracy=0.75, cost=0.394] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:40, 5.07it/s, accuracy=0.796, cost=0.368]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 9, training avg loss 0.398379, training avg acc 0.773561\n", + "epoch 9, testing avg loss 0.413055, testing avg acc 0.768295\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.42it/s, accuracy=0.862, cost=0.336]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.25it/s, accuracy=0.746, cost=0.408]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:39, 5.19it/s, accuracy=0.807, cost=0.354]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 10, training avg loss 0.378367, training avg acc 0.779997\n", + "epoch 10, testing avg loss 0.411091, testing avg acc 0.771466\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.30it/s, accuracy=0.849, cost=0.336]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.29it/s, accuracy=0.745, cost=0.402]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.32it/s, accuracy=0.801, cost=0.337]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 11, training avg loss 0.358436, training avg acc 0.785431\n", + "epoch 11, testing avg loss 0.402924, testing avg acc 0.772684\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.22it/s, accuracy=0.849, cost=0.308]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.25it/s, accuracy=0.752, cost=0.395]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:37, 5.45it/s, accuracy=0.811, cost=0.32]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 12, training avg loss 0.338979, training avg acc 0.791071\n", + "epoch 12, testing avg loss 0.415512, testing avg acc 0.772348\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.45it/s, accuracy=0.855, cost=0.281]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.20it/s, accuracy=0.74, cost=0.42] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.30it/s, accuracy=0.818, cost=0.29]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 13, training avg loss 0.319973, training avg acc 0.796918\n", + "epoch 13, testing avg loss 0.421690, testing avg acc 0.771532\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.32it/s, accuracy=0.875, cost=0.254]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.27it/s, accuracy=0.744, cost=0.432]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.35it/s, accuracy=0.818, cost=0.28]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 14, training avg loss 0.299559, training avg acc 0.802903\n", + "epoch 14, testing avg loss 0.439885, testing avg acc 0.772849\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.77it/s, accuracy=0.882, cost=0.257]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.40it/s, accuracy=0.743, cost=0.453]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.32it/s, accuracy=0.823, cost=0.265]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 15, training avg loss 0.279262, training avg acc 0.807981\n", + "epoch 15, testing avg loss 0.464145, testing avg acc 0.772904\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.52it/s, accuracy=0.914, cost=0.206]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.88it/s, accuracy=0.743, cost=0.476]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.27it/s, accuracy=0.831, cost=0.241]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 16, training avg loss 0.262802, training avg acc 0.813376\n", + "epoch 16, testing avg loss 0.478891, testing avg acc 0.773563\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.38it/s, accuracy=0.895, cost=0.209]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.17it/s, accuracy=0.744, cost=0.499]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:38, 5.36it/s, accuracy=0.837, cost=0.233]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 17, training avg loss 0.244635, training avg acc 0.818782\n", + "epoch 17, testing avg loss 0.519443, testing avg acc 0.772614\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:39<00:00, 5.48it/s, accuracy=0.882, cost=0.22] \n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.45it/s, accuracy=0.744, cost=0.538]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:36, 5.56it/s, accuracy=0.83, cost=0.217]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 18, training avg loss 0.232706, training avg acc 0.822359\n", + "epoch 18, testing avg loss 0.540628, testing avg acc 0.771803\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.41it/s, accuracy=0.914, cost=0.169]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.63it/s, accuracy=0.743, cost=0.552]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:37, 5.46it/s, accuracy=0.833, cost=0.219]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 19, training avg loss 0.219639, training avg acc 0.827012\n", + "epoch 19, testing avg loss 0.558029, testing avg acc 0.771475\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:38<00:00, 5.42it/s, accuracy=0.901, cost=0.177]\n", + "minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.38it/s, accuracy=0.74, cost=0.579] " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 20, training avg loss 0.209063, training avg acc 0.829762\n", + "epoch 20, testing avg loss 0.570628, testing avg acc 0.770907\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "for e in range(epoch):\n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'minibatch loop')\n", + " train_loss, train_acc, test_loss, test_acc = [], [], [], []\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_X))\n", + " batch_x = train_X[i : index]\n", + " y = train_Y[i : index]\n", + " batch_y, _ = pad_sentence_batch(y, 0)\n", + " feed = {model.X: batch_x,\n", + " model.Y: batch_y}\n", + " accuracy, loss, _ = sess.run([model.accuracy,model.cost,model.optimizer],\n", + " feed_dict = feed)\n", + " train_loss.append(loss)\n", + " train_acc.append(accuracy)\n", + " pbar.set_postfix(cost = loss, accuracy = accuracy)\n", + " \n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'minibatch loop')\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_X))\n", + " batch_x = train_X[i : index]\n", + " y = test_Y[i : index]\n", + " batch_y, _ = pad_sentence_batch(y, 0)\n", + " feed = {model.X: batch_x,\n", + " model.Y: batch_y,}\n", + " accuracy, loss = sess.run([model.accuracy,model.cost],\n", + " feed_dict = feed)\n", + "\n", + " test_loss.append(loss)\n", + " test_acc.append(accuracy)\n", + " pbar.set_postfix(cost = loss, accuracy = accuracy)\n", + " \n", + " print('epoch %d, training avg loss %f, training avg acc %f'%(e+1,\n", + " np.mean(train_loss),np.mean(train_acc)))\n", + " print('epoch %d, testing avg loss %f, testing avg acc %f'%(e+1,\n", + " np.mean(test_loss),np.mean(test_acc)))" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "empty_y = np.zeros((1, len(batch_y[0])))\n", + "predicted = ''.join(\n", + " [\n", + " idx2char[c]\n", + " for c in sess.run(\n", + " model.preds, feed_dict = {model.X: batch_x[:1], model.Y: empty_y}\n", + " )[0]\n", + " if idx2char[c] not in ['S', 'E']\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "predicted: syytteeword jooe, ground truth: say the word tool\n" + ] + } + ], + "source": [ + "ground_truth = ''.join(\n", + " [idx2char[c] for c in batch_y[0] if idx2char[c] not in ['S', 'E']]\n", + ")\n", + "print('predicted: %s, ground truth: %s' % (predicted, ground_truth))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/1.tacotron/README.md b/speech-to-text/1.tacotron/README.md deleted file mode 100644 index 9824199..0000000 --- a/speech-to-text/1.tacotron/README.md +++ /dev/null @@ -1,79 +0,0 @@ -## How-to - -Make sure you already run [download.ipynb](../download.ipynb) - -1. Run [caching.py](caching.py), -```bash -python3 caching.py -``` - -2. Run [train.py](train.py), -```bash -python3 train.py -``` - -Output after 30 epochs, -```text -minibatch loop: 100%|█████████████████████████████████████████████████████████████████████████████████| 88/88 [00:33<00:00, 2.42it/s, accuracy=0.763, cost=1.2] -epoch 1, avg loss 1.980871, avg acc 0.407110 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.44it/s, accuracy=0.757, cost=0.769] -epoch 2, avg loss 0.915204, avg acc 0.730742 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.63it/s, accuracy=0.743, cost=0.698] -epoch 3, avg loss 0.715623, avg acc 0.736387 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.58it/s, accuracy=0.766, cost=0.633] -epoch 4, avg loss 0.658667, avg acc 0.741199 -minibatch loop: 100%|████████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.53it/s, accuracy=0.76, cost=0.597] -epoch 5, avg loss 0.609283, avg acc 0.747445 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.41it/s, accuracy=0.776, cost=0.554] -epoch 6, avg loss 0.578279, avg acc 0.752976 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.41it/s, accuracy=0.776, cost=0.537] -epoch 7, avg loss 0.560772, avg acc 0.755184 -minibatch loop: 100%|████████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.49it/s, accuracy=0.783, cost=0.53] -epoch 8, avg loss 0.545729, avg acc 0.757628 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.44it/s, accuracy=0.786, cost=0.518] -epoch 9, avg loss 0.532329, avg acc 0.760964 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.41it/s, accuracy=0.796, cost=0.508] -epoch 10, avg loss 0.523550, avg acc 0.762647 -minibatch loop: 100%|████████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.25it/s, accuracy=0.78, cost=0.498] -epoch 11, avg loss 0.514580, avg acc 0.765008 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.33it/s, accuracy=0.803, cost=0.485] -epoch 12, avg loss 0.504837, avg acc 0.767789 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.23it/s, accuracy=0.806, cost=0.465] -epoch 13, avg loss 0.497120, avg acc 0.769682 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.45it/s, accuracy=0.799, cost=0.459] -epoch 14, avg loss 0.490377, avg acc 0.771292 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.50it/s, accuracy=0.796, cost=0.451] -epoch 15, avg loss 0.483348, avg acc 0.771440 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.61it/s, accuracy=0.803, cost=0.448] -epoch 16, avg loss 0.474057, avg acc 0.774800 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.55it/s, accuracy=0.809, cost=0.434] -epoch 17, avg loss 0.467558, avg acc 0.776787 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.43it/s, accuracy=0.809, cost=0.428] -epoch 18, avg loss 0.459665, avg acc 0.778445 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.27it/s, accuracy=0.809, cost=0.423] -epoch 19, avg loss 0.453300, avg acc 0.780418 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.41it/s, accuracy=0.816, cost=0.411] -epoch 20, avg loss 0.446304, avg acc 0.782334 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.50it/s, accuracy=0.822, cost=0.405] -epoch 21, avg loss 0.438987, avg acc 0.783284 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.54it/s, accuracy=0.809, cost=0.385] -epoch 22, avg loss 0.433176, avg acc 0.784649 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.26it/s, accuracy=0.816, cost=0.398] -epoch 23, avg loss 0.425066, avg acc 0.787474 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.57it/s, accuracy=0.826, cost=0.377] -epoch 24, avg loss 0.419044, avg acc 0.788965 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.41it/s, accuracy=0.816, cost=0.368] -epoch 25, avg loss 0.411709, avg acc 0.791973 -minibatch loop: 100%|████████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.56it/s, accuracy=0.822, cost=0.37] -epoch 26, avg loss 0.404417, avg acc 0.794112 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.57it/s, accuracy=0.829, cost=0.377] -epoch 27, avg loss 0.399927, avg acc 0.795907 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.40it/s, accuracy=0.826, cost=0.357] -epoch 28, avg loss 0.392872, avg acc 0.797287 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:20<00:00, 4.62it/s, accuracy=0.839, cost=0.354] -epoch 29, avg loss 0.385755, avg acc 0.800487 -minibatch loop: 100%|███████████████████████████████████████████████████████████████████████████████| 88/88 [00:21<00:00, 4.43it/s, accuracy=0.822, cost=0.351] -epoch 30, avg loss 0.381007, avg acc 0.802114 - -predicted: saytthe word sat, ground truth: say the word fat -``` diff --git a/speech-to-text/1.tacotron/caching.py b/speech-to-text/1.tacotron/caching.py deleted file mode 100644 index 54a5118..0000000 --- a/speech-to-text/1.tacotron/caching.py +++ /dev/null @@ -1,13 +0,0 @@ -import tqdm -import os -import numpy as np -from setting import path, load_file - -if not os.path.exists('spectrogram'): - os.mkdir('spectrogram') - -wav_files = [f for f in os.listdir(path) if f.endswith('.wav')] - -for fpath in tqdm.tqdm(wav_files): - fname, spectrogram = load_file(path + fpath) - np.save('spectrogram/{}'.format(fname.replace('wav', 'npy')), spectrogram) diff --git a/speech-to-text/1.tacotron/model.py b/speech-to-text/1.tacotron/model.py deleted file mode 100644 index 69035b7..0000000 --- a/speech-to-text/1.tacotron/model.py +++ /dev/null @@ -1,85 +0,0 @@ -from setting import ( - embed_size, - char2idx, - learning_rate, - encoder_num_banks, - n_mels, - reduction_factor, - num_highway_blocks, -) -from modules import ( - conv1d, - normalize_in, - highwaynet, - gru, - attention_decoder, - prenet, - embed, - shift_by_one, - conv1d_banks, -) -import tensorflow as tf - - -def encode(inputs, is_training = True, scope = 'encoder', reuse = None): - with tf.variable_scope(scope, reuse = reuse): - prenet_out = prenet(inputs, scope = 'prenet', is_training = is_training) - enc = conv1d_banks( - prenet_out, K = encoder_num_banks, is_training = is_training - ) - enc = tf.layers.max_pooling1d(enc, 2, 1, padding = 'same') - enc = conv1d(enc, embed_size // 2, 3, scope = 'conv1d_1') - enc = normalize_in(enc, activation_fn = tf.nn.relu) - enc = conv1d(enc, embed_size // 2, 3, scope = 'conv1d_2') - enc = normalize_in(enc, activation_fn = tf.nn.relu) - enc += prenet_out - for i in range(num_highway_blocks): - enc = highwaynet( - enc, units = embed_size // 2, scope = 'highwaynet_%d' % (i) - ) - memory = gru(enc, embed_size // 2, True) - return memory - - -def decode( - inputs, memory, is_training = True, scope = 'decoder_layers', reuse = None -): - with tf.variable_scope(scope, reuse = reuse): - dec = prenet(inputs, is_training = is_training) - dec = attention_decoder(dec, memory, embed_size) - dec += gru(dec, embed_size, False, scope = 'gru1') - dec += gru(dec, embed_size, False, scope = 'gru2') - return tf.layers.dense(dec, len(char2idx)) - - -class Model: - def __init__(self, is_training = True): - self.X = tf.placeholder( - tf.float32, shape = (None, None, n_mels * reduction_factor) - ) - self.Y = tf.placeholder(tf.int32, shape = (None, None)) - self.Y_seq_len = tf.placeholder(tf.int32, [None]) - self.decoder_inputs = embed( - shift_by_one(self.Y), len(char2idx), embed_size - ) - with tf.variable_scope('net'): - self.memory = encode(self.X, is_training = is_training) - self.outputs = decode( - self.decoder_inputs, self.memory, is_training = is_training - ) - self.logprobs = tf.log(tf.nn.softmax(self.outputs) + 1e-10) - self.preds = tf.argmax(self.outputs, axis = -1) - correct_pred = tf.equal(tf.cast(self.preds, tf.int32), self.Y) - self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) - if is_training: - masks = tf.sequence_mask( - self.Y_seq_len, - tf.reduce_max(self.Y_seq_len), - dtype = tf.float32, - ) - self.cost = tf.contrib.seq2seq.sequence_loss( - logits = self.outputs, targets = self.Y, weights = masks - ) - self.optimizer = tf.train.AdamOptimizer(learning_rate).minimize( - self.cost - ) diff --git a/speech-to-text/1.tacotron/modules.py b/speech-to-text/1.tacotron/modules.py deleted file mode 100644 index a50b79a..0000000 --- a/speech-to-text/1.tacotron/modules.py +++ /dev/null @@ -1,214 +0,0 @@ -from setting import embed_size -import tensorflow as tf - - -def embed(inputs, vocab_size, dimension, scope = 'embedding', reuse = None): - with tf.variable_scope(scope, reuse = reuse): - lookup_table = tf.get_variable( - 'lookup_table', - dtype = tf.float32, - shape = [vocab_size, dimension], - initializer = tf.truncated_normal_initializer( - mean = 0.0, stddev = 0.01 - ), - ) - lookup_table = tf.concat( - (tf.zeros(shape = [1, dimension]), lookup_table[1:, :]), 0 - ) - return tf.nn.embedding_lookup(lookup_table, inputs) - - -def normalize_bn( - inputs, - decay = 0.99, - is_training = True, - activation_fn = None, - scope = 'normalize_bn', -): - inputs_shape = inputs.get_shape() - inputs_rank = inputs_shape.ndims - if inputs_rank in [2, 3, 4]: - if inputs_rank == 2: - inputs = tf.expand_dims(inputs, axis = 1) - inputs = tf.expand_dims(inputs, axis = 2) - elif inputs_rank == 3: - inputs = tf.expand_dims(inputs, axis = 1) - outputs = tf.contrib.layers.batch_norm( - inputs = inputs, - decay = decay, - center = True, - scale = True, - activation_fn = activation_fn, - updates_collections = None, - is_training = is_training, - scope = scope, - zero_debias_moving_mean = True, - fused = True, - ) - if inputs_rank == 2: - outputs = tf.squeeze(outputs, axis = [1, 2]) - elif inputs_rank == 3: - outputs = tf.squeeze(outputs, axis = 1) - else: - outputs = tf.contrib.layers.batch_norm( - inputs = inputs, - decay = decay, - center = True, - scale = True, - activation_fn = activation_fn, - updates_collections = None, - is_training = is_training, - scope = scope, - fused = False, - ) - return outputs - - -def normalize_layer_norm( - inputs, activation_fn = None, scope = 'normalize_layer_norm' -): - return tf.contrib.layers.layer_norm( - inputs = inputs, - center = True, - scale = True, - activation_fn = activation_fn, - scope = scope, - ) - - -def normalize_in(inputs, activation_fn = None, scope = 'normalize_in'): - with tf.variable_scope(scope): - batch, steps, channels = inputs.get_shape().as_list() - var_shape = [channels] - mu, sigma_sq = tf.nn.moments(inputs, [1], keep_dims = True) - shift = tf.Variable(tf.zeros(var_shape)) - scale = tf.Variable(tf.ones(var_shape)) - epsilon = 1e-8 - normalized = (inputs - mu) / (sigma_sq + epsilon) ** (0.5) - outputs = scale * normalized + shift - if activation_fn: - outputs = activation_fn(outputs) - return outputs - - -def conv1d( - inputs, - filters = None, - size = 1, - rate = 1, - padding = 'SAME', - use_bias = False, - activation_fn = None, - scope = 'conv1d', - reuse = None, -): - with tf.variable_scope(scope): - if padding.lower() == 'causal': - pad_len = (size - 1) * rate - inputs = tf.pad(inputs, [[0, 0], [pad_len, 0], [0, 0]]) - padding = 'valid' - if filters is None: - filters = inputs.get_shape().as_list()[-1] - params = { - 'inputs': inputs, - 'filters': filters, - 'kernel_size': size, - 'dilation_rate': rate, - 'padding': padding, - 'activation': activation_fn, - 'use_bias': use_bias, - 'reuse': reuse, - } - outputs = tf.layers.conv1d(**params) - return outputs - - -def conv1d_banks( - inputs, K = 16, is_training = True, scope = 'conv1d_banks', reuse = None -): - with tf.variable_scope(scope, reuse = reuse): - outputs = conv1d(inputs, embed_size // 2, 1) - outputs = normalize_in(outputs, tf.nn.relu) - for k in range(2, K + 1): - with tf.variable_scope('num_%d' % (k)): - output = conv1d(inputs, embed_size // 2, k) - output = normalize_in(output, tf.nn.relu) - outputs = tf.concat((outputs, output), -1) - return outputs - - -def gru(inputs, units = None, bidirection = False, scope = 'gru', reuse = None): - with tf.variable_scope(scope, reuse = reuse): - if units is None: - units = inputs.get_shape().as_list()[-1] - cell = tf.contrib.rnn.GRUCell(units) - if bidirection: - cell_bw = tf.contrib.rnn.GRUCell(units) - outputs, _ = tf.nn.bidirectional_dynamic_rnn( - cell, cell_bw, inputs, dtype = tf.float32 - ) - return tf.concat(outputs, 2) - else: - outputs, _ = tf.nn.dynamic_rnn(cell, inputs, dtype = tf.float32) - return outputs - - -def attention_decoder( - inputs, memory, units = None, scope = 'attention_decoder', reuse = None -): - with tf.variable_scope(scope, reuse = reuse): - if units is None: - units = inputs.get_shape().as_list()[-1] - attention_mechanism = tf.contrib.seq2seq.BahdanauAttention( - units, memory - ) - decoder_cell = tf.contrib.rnn.GRUCell(units) - cell_with_attention = tf.contrib.seq2seq.AttentionWrapper( - decoder_cell, attention_mechanism, units - ) - outputs, _ = tf.nn.dynamic_rnn( - cell_with_attention, inputs, dtype = tf.float32 - ) - return outputs - - -def prenet(inputs, is_training = True, scope = 'prenet', reuse = None): - with tf.variable_scope(scope, reuse = reuse): - outputs = tf.layers.dense( - inputs, units = embed_size, activation = tf.nn.relu, name = 'dense1' - ) - outputs = tf.nn.dropout( - outputs, - keep_prob = 0.5 if is_training == True else 1.0, - name = 'dropout1', - ) - outputs = tf.layers.dense( - outputs, - units = embed_size // 2, - activation = tf.nn.relu, - name = 'dense2', - ) - outputs = tf.nn.dropout( - outputs, - keep_prob = 0.5 if is_training == True else 1.0, - name = 'dropout2', - ) - return outputs - - -def highwaynet(inputs, units = None, scope = 'highwaynet', reuse = None): - with tf.variable_scope(scope, reuse = reuse): - if units is None: - units = inputs.get_shape().as_list()[-1] - H = tf.layers.dense( - inputs, units = units, activation = tf.nn.relu, name = 'dense1' - ) - T = tf.layers.dense( - inputs, units = units, activation = tf.nn.sigmoid, name = 'dense2' - ) - C = 1.0 - T - return H * T + inputs * C - - -def shift_by_one(inputs): - return tf.concat((tf.zeros_like(inputs[:, :1]), inputs[:, :-1]), 1) diff --git a/speech-to-text/1.tacotron/setting.py b/speech-to-text/1.tacotron/setting.py deleted file mode 100644 index dd5e7ac..0000000 --- a/speech-to-text/1.tacotron/setting.py +++ /dev/null @@ -1,70 +0,0 @@ -path = '../data/' -max_len = 100 -sampling_rate = 22050 -n_fft = 2048 -frame_shift = 0.0125 -frame_length = 0.05 -hop_length = int(sampling_rate * frame_shift) -win_length = int(sampling_rate * frame_length) -n_mels = 80 - -embed_size = 256 -encoder_num_banks = 16 -decoder_num_banks = 8 -num_highway_blocks = 4 -reduction_factor = 5 - -learning_rate = 1e-4 -batch_size = 32 - -vocab = "ES abcdefghijklmnopqrstuvwxyz'" -char2idx = {char: idx for idx, char in enumerate(vocab)} -idx2char = {idx: char for idx, char in enumerate(vocab)} - -import re -import os - - -def text2idx(text): - text = re.sub(r'[^a-z ]', '', text.lower()).strip() + 'S' - converted = [char2idx[char] for char in text] - return text, converted - - -import librosa -import numpy as np - - -def get_spectrogram(fpath): - y, sr = librosa.load(fpath, sr = sampling_rate) - D = librosa.stft( - y = y, n_fft = n_fft, hop_length = hop_length, win_length = win_length - ) - magnitude = np.abs(D) - power = magnitude ** 2 - S = librosa.feature.melspectrogram(S = power, n_mels = n_mels) - return np.transpose(S.astype(np.float32)) - - -def reduce_frames(x, r_factor): - T, C = x.shape - num_paddings = reduction_factor - (T % r_factor) if T % r_factor != 0 else 0 - padded = np.pad(x, [[0, num_paddings], [0, 0]], 'constant') - return np.reshape(padded, (-1, C * r_factor)) - - -def restore_shape(x, r_factor): - N, _, C = x.shape - return x.reshape((N, -1, C // r_factor)) - - -def load_file(path): - fname = os.path.basename(path) - spectrogram = get_spectrogram(path) - spectrogram = reduce_frames(spectrogram, reduction_factor) - return fname, spectrogram - - -def get_cached(path): - spectrogram = 'spectrogram/' + path + '.npy' - return np.load(spectrogram) diff --git a/speech-to-text/1.tacotron/train.py b/speech-to-text/1.tacotron/train.py deleted file mode 100644 index 3c50e80..0000000 --- a/speech-to-text/1.tacotron/train.py +++ /dev/null @@ -1,109 +0,0 @@ -# coding: utf-8 - -# In[1]: - - -import tensorflow as tf -from setting import ( - text2idx, - get_cached, - batch_size, - n_mels, - reduction_factor, - idx2char, -) -from tqdm import tqdm -import numpy as np -import os - - -# In[2]: - - -paths, lengths, texts = [], [], [] -text_files = [f for f in os.listdir('spectrogram') if f.endswith('.npy')] -for fpath in text_files: - with open('../data/' + fpath.replace('npy', 'txt')) as fopen: - text, converted = text2idx(fopen.read()) - texts.append(converted) - lengths.append(len(text)) - paths.append(fpath.replace('.npy', '')) - - -# In[3]: - - -def dynamic_batching(paths): - spectrograms, max_x = [], 0 - for path in paths: - spectrograms.append(np.load('spectrogram/' + path + '.npy')) - if spectrograms[-1].shape[0] > max_x: - max_x = spectrograms[-1].shape[0] - return spectrograms, max_x - - -# In[4]: - - -from model import Model - -tf.reset_default_graph() -sess = tf.InteractiveSession() -model = Model() -sess.run(tf.global_variables_initializer()) - - -# In[5]: - - -for e in range(30): - pbar = tqdm(range(0, len(text_files), batch_size), desc = 'minibatch loop') - total_cost, total_acc = 0, 0 - for k in pbar: - index = min(k + batch_size, len(text_files)) - files, max_x = dynamic_batching(paths[k:index]) - max_y = max(lengths[k:index]) - batch_x = np.zeros((len(files), max_x, n_mels * reduction_factor)) - batch_y = np.zeros((len(files), max_y)) - for n in range(len(files)): - batch_x[n] = np.pad( - files[n], - ((max_x - files[n].shape[0], 0), (0, 0)), - mode = 'constant', - ) - batch_y[n] = np.pad( - texts[k + n], - ((0, max_y - len(texts[k + n]))), - mode = 'constant', - ) - _, acc, cost = sess.run( - [model.optimizer, model.accuracy, model.cost], - feed_dict = { - model.X: batch_x, - model.Y: batch_y, - model.Y_seq_len: lengths[k:index], - }, - ) - total_cost += cost - total_acc += acc - pbar.set_postfix(cost = cost, accuracy = acc) - total_cost /= len(text_files) / batch_size - total_acc /= len(text_files) / batch_size - - print('epoch %d, avg loss %f, avg acc %f' % (e + 1, total_cost, total_acc)) - - -empty_y = np.zeros((1, len(batch_y[0]))) -predicted = ''.join( - [ - idx2char[c] - for c in sess.run( - model.preds, feed_dict = {model.X: batch_x[:1], model.Y: empty_y} - )[0] - if idx2char[c] not in ['S', 'E'] - ] -) -ground_truth = ''.join( - [idx2char[c] for c in batch_y[0] if idx2char[c] not in ['S', 'E']] -) -print('predicted: %s, ground truth: %s' % (predicted, ground_truth)) diff --git a/speech-to-text/10.deep-speech2.ipynb b/speech-to-text/10.deep-speech2.ipynb new file mode 100644 index 0000000..4db1e65 --- /dev/null +++ b/speech-to-text/10.deep-speech2.ipynb @@ -0,0 +1,788 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "\n", + "dimension = 400\n", + "vocab = \"EOS abcdefghijklmnopqrstuvwxyz'\"\n", + "char2idx = {char: idx for idx, char in enumerate(vocab)}\n", + "idx2char = {idx: char for idx, char in enumerate(vocab)}\n", + "\n", + "def text2idx(text):\n", + " text = re.sub(r'[^a-z ]', '', text.lower()).strip()\n", + " converted = [char2idx[char] for char in text]\n", + " return text, converted" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "GO = 1\n", + "PAD = 0\n", + "EOS = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "train_X, train_Y = [], []\n", + "text_files = [f for f in os.listdir('spectrogram-train') if f.endswith('.npy')]\n", + "for fpath in text_files:\n", + " try:\n", + " splitted = fpath.split('-')\n", + " if len(splitted) == 2:\n", + " splitted[1] = splitted[1].split('.')[1]\n", + " fpath = splitted[0] + '.' + splitted[1]\n", + " with open('data/' + fpath.replace('npy', 'txt')) as fopen:\n", + " text, converted = text2idx(fopen.read())\n", + " w = np.load('spectrogram-train/' + fpath)\n", + " if w.shape[1] != dimension:\n", + " continue\n", + " train_X.append(w)\n", + " train_Y.append(converted)\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "test_X, test_Y = [], []\n", + "text_files = [f for f in os.listdir('spectrogram-test') if f.endswith('.npy')]\n", + "for fpath in text_files:\n", + " with open('data/' + fpath.replace('npy', 'txt')) as fopen:\n", + " text, converted = text2idx(fopen.read())\n", + " w = np.load('spectrogram-test/' + fpath)\n", + " if w.shape[1] != dimension:\n", + " continue\n", + " test_X.append(w)\n", + " test_Y.append(converted)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_second_dim(x, desired_size):\n", + " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", + " return tf.concat([x, padding], 1)\n", + "\n", + "_BATCH_NORM_EPSILON = 1e-5\n", + "_BATCH_NORM_DECAY = 0.997\n", + "_CONV_FILTERS = 32\n", + "\n", + "def batch_norm(inputs, training):\n", + " return tf.layers.batch_normalization(\n", + " inputs=inputs, momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON,\n", + " fused=True, training=training)\n", + "\n", + "def _conv_bn_layer(inputs, padding, filters, kernel_size, strides, layer_id,\n", + " training):\n", + " inputs = tf.pad(\n", + " inputs,\n", + " [[0, 0], [padding[0], padding[0]], [padding[1], padding[1]], [0, 0]])\n", + " inputs = tf.layers.conv2d(\n", + " inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,\n", + " padding=\"valid\", use_bias=False, activation=tf.nn.relu6,\n", + " name=\"cnn_{}\".format(layer_id))\n", + " return inputs\n", + " #return batch_norm(inputs, training)\n", + "\n", + "def _rnn_layer(inputs, rnn_cell, rnn_hidden_size, layer_id, is_batch_norm,\n", + " is_bidirectional, training):\n", + " if is_batch_norm:\n", + " inputs = batch_norm(inputs, training)\n", + " \n", + " fw_cell = rnn_cell(num_units=rnn_hidden_size,\n", + " name=\"rnn_fw_{}\".format(layer_id))\n", + " bw_cell = rnn_cell(num_units=rnn_hidden_size,\n", + " name=\"rnn_bw_{}\".format(layer_id))\n", + "\n", + " if is_bidirectional:\n", + " outputs, _ = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw=fw_cell, cell_bw=bw_cell, inputs=inputs, dtype=tf.float32,\n", + " swap_memory=True)\n", + " rnn_outputs = tf.concat(outputs, -1)\n", + " else:\n", + " rnn_outputs = tf.nn.dynamic_rnn(\n", + " fw_cell, inputs, dtype=tf.float32, swap_memory=True)\n", + "\n", + " return rnn_outputs\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " size_layers,\n", + " learning_rate,\n", + " num_features,\n", + " dropout = 1.0,\n", + " ):\n", + " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", + " self.label = tf.placeholder(tf.int32, [None, None])\n", + " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", + " self.training = tf.placeholder(tf.bool, None)\n", + " self.Y = tf.sparse_placeholder(tf.int32)\n", + " x = tf.expand_dims(self.X, -1)\n", + "\n", + " inputs = _conv_bn_layer(\n", + " x, padding=(20, 5), filters=_CONV_FILTERS, kernel_size=(41, 11),\n", + " strides=(2, 2), layer_id=1, training=self.training)\n", + " \n", + " inputs = _conv_bn_layer(\n", + " inputs, padding=(10, 5), filters=_CONV_FILTERS, kernel_size=(21, 11),\n", + " strides=(2, 1), layer_id=2, training=self.training)\n", + " \n", + " batch_size = tf.shape(inputs)[0]\n", + " feat_size = inputs.get_shape().as_list()[2]\n", + " inputs = tf.reshape(\n", + " inputs,\n", + " [batch_size, -1, feat_size * _CONV_FILTERS // 4])\n", + " print(inputs)\n", + " \n", + " seq_lens = tf.count_nonzero(\n", + " tf.reduce_sum(inputs, -1), 1, dtype = tf.int32\n", + " ) + 30\n", + " filled = tf.fill(tf.shape(seq_lens), tf.shape(inputs)[1])\n", + " seq_lens = tf.where(seq_lens > tf.shape(inputs)[1], filled, seq_lens)\n", + " \n", + " rnn_cell = tf.nn.rnn_cell.GRUCell\n", + " for layer_counter in range(5):\n", + " is_batch_norm = (layer_counter != 0)\n", + " inputs = _rnn_layer(\n", + " inputs, rnn_cell, size_layers, layer_counter + 1,\n", + " False, True, self.training)\n", + " \n", + "\n", + " logits = tf.layers.dense(inputs, len(vocab))\n", + " self.logits = logits\n", + " time_major = tf.transpose(logits, [1, 0, 2])\n", + " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", + " decoded = tf.to_int32(decoded[0])\n", + " self.preds = tf.sparse_tensor_to_dense(decoded)\n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.ctc_loss(\n", + " self.Y,\n", + " time_major,\n", + " seq_lens\n", + " )\n", + " )\n", + " self.optimizer = tf.train.AdamOptimizer(\n", + " learning_rate = learning_rate\n", + " ).minimize(self.cost)\n", + " \n", + " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", + " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", + " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", + " y_t = tf.cast(preds, tf.int32)\n", + " self.prediction = tf.boolean_mask(y_t, masks)\n", + " mask_label = tf.boolean_mask(self.label, masks)\n", + " self.mask_label = mask_label\n", + " correct_pred = tf.equal(self.prediction, mask_label)\n", + " correct_index = tf.cast(correct_pred, tf.float32)\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1735: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"Reshape:0\", shape=(?, ?, 1600), dtype=float32)\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-4\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [03:06<00:00, 1.22it/s, accuracy=0.755, cost=12.9] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:08<00:00, 1.06it/s, accuracy=0.761, cost=13] \n", + "minibatch loop: 0%| | 0/206 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " kernel_size = 5\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " x = tf.layers.conv1d(self.X, size_layers, 1)\n", - " for i in range(num_layers):\n", - " dilation_rate = 2 ** i\n", - " pad_sz = (kernel_size - 1) * dilation_rate\n", - " with tf.variable_scope('block_%d'%i):\n", - " x += cnn_block(x, dilation_rate, pad_sz, size_layers, kernel_size)\n", - " print(x)\n", - " \n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.GRUCell(\n", - " size,\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " \n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layers),\n", - " cell_bw = cells(size_layers),\n", - " inputs = x,\n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d' % (n),\n", - " )\n", - " x = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " logits = tf.layers.dense(x, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " self.dense_decoded = tf.sparse_tensor_to_dense(decoded[0], default_value=-1)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tensor(\"block_1/add_2:0\", shape=(?, ?, 128), dtype=float32)\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:44<00:00, 1.94it/s, accuracy=0.037, cost=38.3] \n", - "minibatch loop: 0%| | 0/88 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " kernel_size = 5\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " x = tf.layers.conv1d(self.X, size_layers, 1)\n", - " for i in range(3):\n", - " dilation_rate = 2 ** i\n", - " pad_sz = (kernel_size - 1) * dilation_rate\n", - " with tf.variable_scope('block_%d'%i):\n", - " x += cnn_block(x, dilation_rate, pad_sz, size_layers, kernel_size)\n", - " print(x)\n", - " \n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.GRUCell(\n", - " size,\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " \n", - " for i in range(5):\n", - " with tf.variable_scope('rnn_%d'%i):\n", - " x = layer_norm(x)\n", - " x, _ = tf.nn.dynamic_rnn(cell = cells(size_layers), inputs = x, dtype=tf.float32)\n", - " \n", - " logits = tf.layers.dense(x, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " self.dense_decoded = tf.sparse_tensor_to_dense(decoded[0], default_value=-1)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tensor(\"block_2/add_2:0\", shape=(?, ?, 128), dtype=float32)\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:37<00:00, 2.20it/s, accuracy=0.0222, cost=33] \n", - "minibatch loop: 0%| | 0/88 [00:00:16: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :79: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:From :99: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING:tensorflow:From :103: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :106: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-5\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = RNN(num_layers, size_layers, learning_rate)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "\n", + "checkpoint = 'wav2vec/model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from wav2vec/model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 256/256 [29:21<00:00, 6.88s/it, accuracy=0, cost=52.2] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:16<00:00, 1.78s/it, accuracy=0, cost=51.5]\n", + "minibatch loop: 0%| | 0/256 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "\n", - "def cnn_block(x, dilation_rate, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " x = tf.layers.conv1d(inputs = x,\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " kernel_size = 5\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " x = tf.layers.conv1d(self.X, size_layers, 1)\n", - " for i in range(num_layers):\n", - " dilation_rate = 3 ** i\n", - " pad_sz = (kernel_size - 1) * dilation_rate\n", - " with tf.variable_scope('block_%d'%i):\n", - " x = cnn_block(x, dilation_rate, size_layers, kernel_size)\n", - " print(x)\n", - " \n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(x, -1), 1, dtype = tf.int32\n", - " )\n", - " \n", - " logits = tf.layers.dense(x, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " self.dense_decoded = tf.sparse_tensor_to_dense(decoded[0], default_value=-1)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tensor(\"block_0/Relu:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"block_1/Relu:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"block_2/Relu:0\", shape=(?, ?, 128), dtype=float32)\n", - "Tensor(\"block_3/Relu:0\", shape=(?, ?, 128), dtype=float32)\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 4\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:56<00:00, 1.59it/s, accuracy=0, cost=45.3] \n", - "minibatch loop: 0%| | 0/88 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - "\n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - "\n", - " features = self.X\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layers),\n", - " cell_bw = cells(size_layers),\n", - " inputs = features,\n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d' % (n),\n", - " )\n", - " features = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " logits = tf.layers.dense(features, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_greedy_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-3\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:21<00:00, 4.09it/s, accuracy=0.77, cost=14.2] \n", - "minibatch loop: 0%| | 0/88 [00:00:27: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 10:23:36.370378 140152909862720 deprecation.py:323] From :44: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 10:23:36.371528 140152909862720 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 10:23:36.474225 140152909862720 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 10:23:37.043940 140152909862720 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0830 10:23:37.572719 140152909862720 deprecation.py:323] From :48: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 10:23:37.577309 140152909862720 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 10:23:37.854312 140152909862720 deprecation.py:323] From :51: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-3\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.54it/s, accuracy=0.813, cost=8.69]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 10.01it/s, accuracy=0.791, cost=10.1]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.28it/s, accuracy=0.808, cost=9.41]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 1, training avg cost 14.290586, training avg accuracy 0.714584\n", + "epoch 1, testing avg cost 10.127820, testing avg accuracy 0.794851\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.36it/s, accuracy=0.906, cost=5.41]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.63it/s, accuracy=0.825, cost=9.7] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.58it/s, accuracy=0.871, cost=5.82]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 2, training avg cost 6.847091, training avg accuracy 0.852864\n", + "epoch 2, testing avg cost 9.722075, testing avg accuracy 0.817583\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.42it/s, accuracy=0.957, cost=2.58]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.48it/s, accuracy=0.829, cost=11] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:35, 5.73it/s, accuracy=0.936, cost=2.62]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 3, training avg cost 3.478095, training avg accuracy 0.924761\n", + "epoch 3, testing avg cost 10.025416, testing avg accuracy 0.830991\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.85it/s, accuracy=0.964, cost=1.54] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 16.89it/s, accuracy=0.835, cost=12.6]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.54it/s, accuracy=0.964, cost=1.73]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 4, training avg cost 1.757180, training avg accuracy 0.962556\n", + "epoch 4, testing avg cost 11.058598, testing avg accuracy 0.836111\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.57it/s, accuracy=0.964, cost=0.91] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.39it/s, accuracy=0.841, cost=13.5]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.44it/s, accuracy=0.981, cost=0.927]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 5, training avg cost 0.881040, training avg accuracy 0.981411\n", + "epoch 5, testing avg cost 11.581189, testing avg accuracy 0.845335\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.48it/s, accuracy=1, cost=0.25] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.86it/s, accuracy=0.836, cost=14.1]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.22it/s, accuracy=0.992, cost=0.523]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 6, training avg cost 0.477564, training avg accuracy 0.990754\n", + "epoch 6, testing avg cost 12.383659, testing avg accuracy 0.846003\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.55it/s, accuracy=0.978, cost=0.67] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.33it/s, accuracy=0.842, cost=14.3]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:33, 6.21it/s, accuracy=0.998, cost=0.238]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 7, training avg cost 0.323579, training avg accuracy 0.993756\n", + "epoch 7, testing avg cost 12.912425, testing avg accuracy 0.844236\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.33it/s, accuracy=1, cost=0.175] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.50it/s, accuracy=0.837, cost=16.2]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.42it/s, accuracy=1, cost=0.124]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 8, training avg cost 0.263046, training avg accuracy 0.994518\n", + "epoch 8, testing avg cost 13.621968, testing avg accuracy 0.843584\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.41it/s, accuracy=1, cost=0.181] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.51it/s, accuracy=0.842, cost=17.2]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.21it/s, accuracy=0.99, cost=0.233]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 9, training avg cost 0.224984, training avg accuracy 0.995366\n", + "epoch 9, testing avg cost 14.591526, testing avg accuracy 0.841815\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.12it/s, accuracy=0.986, cost=0.407] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.21it/s, accuracy=0.837, cost=16.5]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.48it/s, accuracy=0.993, cost=0.252]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 10, training avg cost 0.217173, training avg accuracy 0.995012\n", + "epoch 10, testing avg cost 14.179818, testing avg accuracy 0.843015\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.26it/s, accuracy=1, cost=0.0473] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.20it/s, accuracy=0.839, cost=16.4]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.27it/s, accuracy=0.994, cost=0.177]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 11, training avg cost 0.200937, training avg accuracy 0.995707\n", + "epoch 11, testing avg cost 14.698985, testing avg accuracy 0.848355\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.34it/s, accuracy=1, cost=0.0852] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.63it/s, accuracy=0.843, cost=16.2]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:35, 5.80it/s, accuracy=1, cost=0.0152]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 12, training avg cost 0.071357, training avg accuracy 0.998714\n", + "epoch 12, testing avg cost 14.673738, testing avg accuracy 0.850244\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.31it/s, accuracy=1, cost=0.0163] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.36it/s, accuracy=0.839, cost=17.9]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:33, 6.03it/s, accuracy=0.995, cost=0.0868]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 13, training avg cost 0.058067, training avg accuracy 0.998900\n", + "epoch 13, testing avg cost 15.142712, testing avg accuracy 0.847004\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.44it/s, accuracy=1, cost=0.0349] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.31it/s, accuracy=0.844, cost=17.4]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.48it/s, accuracy=0.996, cost=0.0985]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 14, training avg cost 0.124043, training avg accuracy 0.997135\n", + "epoch 14, testing avg cost 14.900557, testing avg accuracy 0.848673\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.36it/s, accuracy=1, cost=0.0404] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.41it/s, accuracy=0.841, cost=17.4]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.40it/s, accuracy=0.999, cost=0.102]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 15, training avg cost 0.326296, training avg accuracy 0.992874\n", + "epoch 15, testing avg cost 14.411384, testing avg accuracy 0.846652\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.22it/s, accuracy=1, cost=0.0866] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.28it/s, accuracy=0.831, cost=19.3]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:33, 6.19it/s, accuracy=1, cost=0.016]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 16, training avg cost 0.099621, training avg accuracy 0.998065\n", + "epoch 16, testing avg cost 15.712006, testing avg accuracy 0.848810\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:32<00:00, 6.35it/s, accuracy=1, cost=0.0964] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.26it/s, accuracy=0.829, cost=18.2]\n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.43it/s, accuracy=0.995, cost=0.0521]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 17, training avg cost 0.054227, training avg accuracy 0.998931\n", + "epoch 17, testing avg cost 15.720788, testing avg accuracy 0.847848\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:33<00:00, 6.10it/s, accuracy=1, cost=0.192] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 14.99it/s, accuracy=0.842, cost=18] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:31, 6.44it/s, accuracy=0.991, cost=0.171]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 18, training avg cost 0.129716, training avg accuracy 0.997698\n", + "epoch 18, testing avg cost 15.702752, testing avg accuracy 0.848713\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:33<00:00, 6.38it/s, accuracy=1, cost=0.109] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.21it/s, accuracy=0.847, cost=18] \n", + "minibatch loop: 0%| | 1/206 [00:00<00:32, 6.36it/s, accuracy=1, cost=0.0336]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 19, training avg cost 0.175665, training avg accuracy 0.995999\n", + "epoch 19, testing avg cost 15.015889, testing avg accuracy 0.849449\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:33<00:00, 6.05it/s, accuracy=1, cost=0.0231] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:00<00:00, 15.41it/s, accuracy=0.848, cost=18.8]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 20, training avg cost 0.075544, training avg accuracy 0.998342\n", + "epoch 20, testing avg cost 16.034355, testing avg accuracy 0.846626\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "for e in range(epoch):\n", + " pbar = tqdm(\n", + " range(0, len(train_X), batch_size), desc = 'minibatch loop')\n", + " train_cost, train_accuracy, test_cost, test_accuracy = [], [], [], []\n", + " for i in pbar:\n", + " batch_x = train_X[i : min(i + batch_size, len(train_X))]\n", + " y = train_Y[i : min(i + batch_size, len(train_X))]\n", + " batch_y = sparse_tuple_from(y)\n", + " batch_label, batch_len = pad_sentence_batch(y, 0)\n", + " _, cost, accuracy = sess.run(\n", + " [model.optimizer, model.cost, model.accuracy],\n", + " feed_dict = {model.X: batch_x, model.Y: batch_y, \n", + " model.label: batch_label, model.Y_seq_len: batch_len},\n", + " )\n", + " train_cost.append(cost)\n", + " train_accuracy.append(accuracy)\n", + " pbar.set_postfix(cost = cost, accuracy = accuracy)\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_X), batch_size), desc = 'testing minibatch loop')\n", + " for i in pbar:\n", + " batch_x = test_X[i : min(i + batch_size, len(test_X))]\n", + " y = test_Y[i : min(i + batch_size, len(test_X))]\n", + " batch_y = sparse_tuple_from(y)\n", + " batch_label, batch_len = pad_sentence_batch(y, 0)\n", + " cost, accuracy = sess.run(\n", + " [model.cost, model.accuracy],\n", + " feed_dict = {model.X: batch_x, model.Y: batch_y, \n", + " model.label: batch_label, model.Y_seq_len: batch_len},\n", + " )\n", + " \n", + " test_cost.append(cost)\n", + " test_accuracy.append(accuracy)\n", + " \n", + " pbar.set_postfix(cost = cost, accuracy = accuracy)\n", + " print('epoch %d, training avg cost %f, training avg accuracy %f'%(e + 1, np.mean(train_cost), \n", + " np.mean(train_accuracy)))\n", + " \n", + " print('epoch %d, testing avg cost %f, testing avg accuracy %f'%(e + 1, np.mean(test_cost), \n", + " np.mean(test_accuracy)))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "real: say the word fall\n", + "predicted: say the word hall\n" + ] + } + ], + "source": [ + "import random\n", + "\n", + "random_index = random.randint(0, len(test_X) - 1)\n", + "batch_x = test_X[random_index : random_index + 1]\n", + "print(\n", + " 'real:',\n", + " ''.join(\n", + " [idx2char[no] for no in test_Y[random_index : random_index + 1][0]]\n", + " ),\n", + ")\n", + "batch_y = sparse_tuple_from(test_Y[random_index : random_index + 1])\n", + "pred = sess.run(model.preds, feed_dict = {model.X: batch_x})[0]\n", + "print('predicted:', ''.join([idx2char[no] for no in pred]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/3.birnn-ctc-beam.ipynb b/speech-to-text/3.birnn-ctc-beam.ipynb deleted file mode 100644 index 5e27b3f..0000000 --- a/speech-to-text/3.birnn-ctc-beam.ipynb +++ /dev/null @@ -1,1060 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import librosa\n", - "import os\n", - "import tensorflow as tf\n", - "import numpy as np\n", - "from tqdm import tqdm" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "wav_files = [f for f in os.listdir('./data') if f.endswith('.wav')]\n", - "text_files = [f for f in os.listdir('./data') if f.endswith('.txt')]" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████████████████████████████████| 2800/2800 [00:26<00:00, 107.58it/s]\n" - ] - } - ], - "source": [ - "inputs, targets = [], []\n", - "for (wav_file, text_file) in tqdm(zip(wav_files, text_files), total = len(wav_files),ncols=80):\n", - " path = './data/' + wav_file\n", - " try:\n", - " y, sr = librosa.load(path, sr = None)\n", - " except:\n", - " continue\n", - " inputs.append(\n", - " librosa.feature.mfcc(\n", - " y = y, sr = sr, n_mfcc = 40, hop_length = int(0.05 * sr)\n", - " ).T\n", - " )\n", - " with open('./data/' + text_file) as f:\n", - " targets.append(f.read())\n" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "inputs = tf.keras.preprocessing.sequence.pad_sequences(\n", - " inputs, dtype = 'float32', padding = 'post'\n", - ")\n", - "\n", - "chars = list(set([c for target in targets for c in target]))\n", - "num_classes = len(chars) + 2\n", - "\n", - "idx2char = {idx + 1: char for idx, char in enumerate(chars)}\n", - "idx2char[0] = ''\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - "\n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - "\n", - " features = self.X\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layers),\n", - " cell_bw = cells(size_layers),\n", - " inputs = features,\n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d' % (n),\n", - " )\n", - " features = tf.concat((out_fw, out_bw), 2)\n", - "\n", - " logits = tf.layers.dense(features, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-3\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:56<00:00, 1.61it/s, accuracy=0.774, cost=13.9] \n", - "minibatch loop: 0%| | 0/88 [00:00:13: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 20:30:26.229180 139661149812544 deprecation.py:323] From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 20:30:26.230391 139661149812544 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 20:30:26.348684 139661149812544 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"Mean:0\", shape=(?,), dtype=int32)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 20:30:27.024843 139661149812544 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"concat_2:0\", shape=(?, ?, 512), dtype=float32) (LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 20:30:28.160248 139661149812544 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 20:30:28.180953 139661149812544 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 20:30:28.457204 139661149812544 deprecation.py:323] From :52: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 20:30:29.595860 139661149812544 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:107: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.random.categorical` instead.\n", + "W0830 20:30:30.466234 139661149812544 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py:985: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", + " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-3\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:53<00:00, 3.94it/s, accuracy=0.871, cost=0.398]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:01<00:00, 7.93it/s, accuracy=0.845, cost=0.485]\n", + "minibatch loop: 0%| | 0/206 [00:00:13: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 21:27:09.179373 140382336018240 deprecation.py:323] From :36: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 21:27:09.180603 140382336018240 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 21:27:09.294915 140382336018240 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"Mean:0\", shape=(?,), dtype=int32)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 21:27:09.940955 140382336018240 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"concat_2:0\", shape=(?, ?, 512), dtype=float32) (LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 21:27:11.023870 140382336018240 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 21:27:11.044019 140382336018240 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 21:27:11.318089 140382336018240 deprecation.py:323] From :52: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 21:27:12.459459 140382336018240 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/helper.py:107: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.random.categorical` instead.\n", + "W0830 21:27:13.310252 140382336018240 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py:985: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n", + " \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-3\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [00:54<00:00, 3.86it/s, accuracy=0.878, cost=0.456]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:01<00:00, 7.77it/s, accuracy=0.839, cost=0.5] \n", + "minibatch loop: 0%| | 0/206 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - "\n", - " def cells(reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size_layers,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layers, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layers)\n", - "\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " encoder_out, encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = self.X, \n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32)\n", - " \n", - " encoder_state = tuple(encoder_state[-1] for _ in range(num_layers))\n", - " main = tf.strided_slice(self.X, [0, 0, 0], [batch_size, -1, num_features], [1, 1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1, num_features], 0.0), main], 1)\n", - " decoder_cell = attention(encoder_out, seq_lens)\n", - " dense_layer = tf.layers.Dense(num_classes)\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = decoder_input,\n", - " sequence_length = seq_lens,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(seq_lens))\n", - " \n", - " logits = training_decoder_output.rnn_output\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = tf.pad(preds, [[0, 0], [0, tf.reduce_max(self.Y_seq_len)]])\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-3\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:52<00:00, 1.78it/s, accuracy=0.0667, cost=13.7]\n", - "minibatch loop: 0%| | 0/88 [00:00:64: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 21:39:23.221365 140344873588544 deprecation.py:323] From :67: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-3\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [01:06<00:00, 3.19it/s, accuracy=0.77, cost=11.7] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:01<00:00, 6.69it/s, accuracy=0.773, cost=11.7]\n", + "minibatch loop: 0%| | 0/206 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " batch_size = tf.shape(self.X)[0]\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - "\n", - " def cells(reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size_layers,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layers, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layers)\n", - "\n", - " encoder_cells = tf.nn.rnn_cell.MultiRNNCell([cells() for _ in range(num_layers)])\n", - " encoder_out, encoder_state = tf.nn.dynamic_rnn(cell = encoder_cells, \n", - " inputs = self.X, \n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32)\n", - " \n", - " encoder_state = tuple(encoder_state[-1] for _ in range(num_layers))\n", - " main = tf.strided_slice(self.X, [0, 0, 0], [batch_size, -1, num_features], [1, 1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1, num_features], 0.0), main], 1)\n", - " decoder_cell = attention(encoder_out, seq_lens)\n", - " dense_layer = tf.layers.Dense(num_classes)\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = decoder_input,\n", - " sequence_length = seq_lens,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(seq_lens))\n", - " \n", - " logits = training_decoder_output.rnn_output\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens,\n", - " ignore_longer_outputs_than_inputs = True,\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-3\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:21<00:00, 4.01it/s, accuracy=0.759, cost=14.7] \n", - "minibatch loop: 0%| | 0/88 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def attention(inputs, attention_size):\n", - " hidden_size = inputs.shape[2].value\n", - " w_omega = tf.Variable(\n", - " tf.random_normal([hidden_size, attention_size], stddev = 0.1)\n", - " )\n", - " b_omega = tf.Variable(tf.random_normal([attention_size], stddev = 0.1))\n", - " u_omega = tf.Variable(tf.random_normal([attention_size], stddev = 0.1))\n", - " with tf.name_scope('v'):\n", - " v = tf.tanh(tf.tensordot(inputs, w_omega, axes = 1) + b_omega)\n", - " vu = tf.tensordot(v, u_omega, axes = 1, name = 'vu')\n", - " alphas = tf.nn.softmax(vu, name = 'alphas')\n", - " output = inputs * tf.expand_dims(alphas, -1)\n", - " return output, alphas\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - "\n", - " def cells(size, reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - "\n", - " features = self.X\n", - " for n in range(num_layers):\n", - " (out_fw, out_bw), (\n", - " state_fw,\n", - " state_bw,\n", - " ) = tf.nn.bidirectional_dynamic_rnn(\n", - " cell_fw = cells(size_layers),\n", - " cell_bw = cells(size_layers),\n", - " inputs = features,\n", - " sequence_length = seq_lens,\n", - " dtype = tf.float32,\n", - " scope = 'bidirectional_rnn_%d' % (n),\n", - " )\n", - " features = tf.concat((out_fw, out_bw), 2)\n", - " \n", - " features, _ = attention(features, size_layers)\n", - " logits = tf.layers.dense(features, num_classes)\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens,\n", - " ignore_longer_outputs_than_inputs = True,\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-3\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:21<00:00, 4.95it/s, accuracy=0.113, cost=71.3] \n", - "minibatch loop: 0%| | 0/88 [00:00:17: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 21:53:04.468036 139802294327104 deprecation.py:323] From :37: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 21:53:04.469763 139802294327104 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 21:53:04.589721 139802294327104 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 21:53:05.260370 139802294327104 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0830 21:53:06.373947 139802294327104 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 21:53:06.393591 139802294327104 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 21:53:06.672691 139802294327104 deprecation.py:323] From :51: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 21:53:07.765142 139802294327104 deprecation.py:323] From :64: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 21:53:07.801623 139802294327104 deprecation.py:323] From :67: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-3\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [01:02<00:00, 3.47it/s, accuracy=0.777, cost=10.9]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:01<00:00, 6.88it/s, accuracy=0.773, cost=11.8]\n", + "minibatch loop: 0%| | 0/206 [00:00:57: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "W0830 22:01:10.120007 140353921439552 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 22:01:10.239052 140353921439552 deprecation.py:323] From :4: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv2D` instead.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"add_3:0\", shape=(?, ?, 512), dtype=float32) Tensor(\"add_4:0\", shape=(?, ?, 512), dtype=float32)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 22:01:11.372579 140353921439552 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 22:01:11.675654 140353921439552 deprecation.py:323] From :42: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 22:01:11.681477 140353921439552 deprecation.py:323] From :52: MultiRNNCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 22:01:12.654093 140353921439552 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:961: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 22:01:13.462634 140353921439552 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/attention_wrapper.py:2078: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0830 22:01:13.576039 140353921439552 deprecation.py:323] From :92: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-4\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [02:28<00:00, 1.79it/s, accuracy=0.784, cost=12] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:06<00:00, 1.45it/s, accuracy=0.773, cost=13.4]\n", + "minibatch loop: 0%| | 0/206 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_causal(x, size, rate):\n", - " pad_len = (size - 1) * rate\n", - " return tf.pad(x, [[0, 0], [pad_len, 0], [0, 0]])\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " num_blocks = 3,\n", - " block_size = 128,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - "\n", - " def residual_block(x, size, rate, block):\n", - " with tf.variable_scope('block_%d_%d' % (block, rate), reuse = False):\n", - " conv_filter = tf.layers.conv1d(\n", - " x,\n", - " x.shape[2] // 4,\n", - " kernel_size = size,\n", - " strides = 1,\n", - " padding = 'same',\n", - " dilation_rate = rate,\n", - " activation = tf.nn.tanh,\n", - " )\n", - " conv_gate = tf.layers.conv1d(\n", - " x,\n", - " x.shape[2] // 4,\n", - " kernel_size = size,\n", - " strides = 1,\n", - " padding = 'same',\n", - " dilation_rate = rate,\n", - " activation = tf.nn.sigmoid,\n", - " )\n", - " out = tf.multiply(conv_filter, conv_gate)\n", - " out = tf.layers.conv1d(\n", - " out,\n", - " block_size,\n", - " kernel_size = 1,\n", - " strides = 1,\n", - " padding = 'same',\n", - " activation = tf.nn.tanh,\n", - " )\n", - " return tf.add(x, out), out\n", - " forward = tf.layers.conv1d(self.X, block_size, kernel_size = 1, strides = 1, padding = 'SAME')\n", - " zeros = tf.zeros_like(forward)\n", - " for i in range(num_blocks):\n", - " for r in [1, 2, 4, 8, 16]:\n", - " forward, s = residual_block(forward, size=7, rate=r, block=i)\n", - " zeros = tf.add(zeros,s)\n", - " forward = tf.layers.conv1d(zeros, block_size, kernel_size = 1, strides = 1, padding = 'SAME')\n", - " logits = tf.layers.conv1d(zeros, num_classes, kernel_size = 1, strides = 1, padding = 'SAME')\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:42<00:00, 2.30it/s, accuracy=0.607, cost=18.7] \n", - "minibatch loop: 0%| | 0/88 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "27" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "num_classes" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def encoder_block(inp, n_hidden, filter_size):\n", - " inp = tf.expand_dims(inp, 2)\n", - " inp = tf.pad(inp, [[0, 0], [(filter_size[0]-1)//2, (filter_size[0]-1)//2], [0, 0], [0, 0]])\n", - " conv = tf.layers.conv2d(inp, n_hidden, filter_size, padding=\"VALID\", activation=None)\n", - " conv = tf.squeeze(conv, 2)\n", - " return conv\n", - "\n", - "def glu(x):\n", - " return tf.multiply(x[:, :, :tf.shape(x)[2]//2], tf.sigmoid(x[:, :, tf.shape(x)[2]//2:]))\n", - "\n", - "def layer(inp, conv_block, kernel_width, n_hidden, residual=None):\n", - " z = conv_block(inp, n_hidden, (kernel_width, 1))\n", - " return glu(z) + (residual if residual is not None else 0)\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " def cells(reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size_layers,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.BahdanauAttention(num_units = size_layers, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layers)\n", - " \n", - " encoder_embedded = self.X\n", - " encoder_embedded = tf.layers.conv1d(encoder_embedded, size_layers, 1)\n", - " e = tf.identity(encoder_embedded)\n", - " for i in range(num_layers * 2):\n", - " z = layer(encoder_embedded, encoder_block, 3, size_layers * 2, encoder_embedded)\n", - " encoder_embedded = z\n", - " \n", - " encoder_output, output_memory = z, z + e\n", - " print(encoder_output, output_memory)\n", - " \n", - " init_state = tf.reduce_mean(output_memory,axis=1)\n", - " encoder_state = tuple(tf.nn.rnn_cell.LSTMStateTuple(c=init_state, h=init_state) for _ in range(num_layers))\n", - " main = tf.strided_slice(self.X, [0, 0, 0], [batch_size, -1, num_features], [1, 1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1, num_features], 0.0), main], 1)\n", - " decoder_cell = attention(encoder_output, seq_lens)\n", - " dense_layer = tf.layers.Dense(num_classes)\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = decoder_input,\n", - " sequence_length = seq_lens,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(seq_lens))\n", - " self.seq_lens = seq_lens\n", - " \n", - " logits = training_decoder_output.rnn_output\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " self.time_major = time_major\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tensor(\"add_3:0\", shape=(?, ?, 128), dtype=float32) Tensor(\"add_4:0\", shape=(?, ?, 128), dtype=float32)\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:38<00:00, 2.55it/s, accuracy=0.0296, cost=239] \n", - "minibatch loop: 0%| | 0/88 [00:00:40: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "W0830 22:08:40.857681 140199276345152 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Tensor(\"block_1/add_2:0\", shape=(?, ?, 512), dtype=float32)\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0830 22:08:41.879571 140199276345152 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0830 22:08:41.880623 140199276345152 deprecation.py:323] From :52: GRUCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.\n", + "W0830 22:08:41.887044 140199276345152 deprecation.py:323] From :68: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "W0830 22:08:41.887933 140199276345152 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "W0830 22:08:41.969996 140199276345152 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:564: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 22:08:41.983373 140199276345152 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn_cell_impl.py:574: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 22:08:42.168517 140199276345152 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:244: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "W0830 22:08:42.700160 140199276345152 deprecation.py:323] From :72: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 22:08:42.981216 140199276345152 deprecation.py:323] From :75: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-4\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [02:39<00:00, 1.58it/s, accuracy=0.101, cost=8.6] \n", + "testing minibatch loop: 100%|██████████| 9/9 [00:07<00:00, 1.32it/s, accuracy=0.0556, cost=10.4]\n", + "minibatch loop: 0%| | 0/206 [00:00'\n", - "char2idx = {char: idx for idx, char in idx2char.items()}\n", - "\n", - "targets = [[char2idx[c] for c in target] for target in targets]" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "27" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "num_classes" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "def encoder_block(inp, n_hidden, filter_size):\n", - " inp = tf.expand_dims(inp, 2)\n", - " inp = tf.pad(inp, [[0, 0], [(filter_size[0]-1)//2, (filter_size[0]-1)//2], [0, 0], [0, 0]])\n", - " conv = tf.layers.conv2d(inp, n_hidden, filter_size, padding=\"VALID\", activation=None)\n", - " conv = tf.squeeze(conv, 2)\n", - " return conv\n", - "\n", - "def glu(x):\n", - " return tf.multiply(x[:, :, :tf.shape(x)[2]//2], tf.sigmoid(x[:, :, tf.shape(x)[2]//2:]))\n", - "\n", - "def layer(inp, conv_block, kernel_width, n_hidden, residual=None):\n", - " z = conv_block(inp, n_hidden, (kernel_width, 1))\n", - " return glu(z) + (residual if residual is not None else 0)\n", - "\n", - "def pad_second_dim(x, desired_size):\n", - " padding = tf.tile([[0]], tf.stack([tf.shape(x)[0], desired_size - tf.shape(x)[1]], 0))\n", - " return tf.concat([x, padding], 1)\n", - "\n", - "class Model:\n", - " def __init__(\n", - " self,\n", - " num_layers,\n", - " size_layers,\n", - " learning_rate,\n", - " num_features,\n", - " dropout = 1.0,\n", - " ):\n", - " self.X = tf.placeholder(tf.float32, [None, None, num_features])\n", - " self.label = tf.placeholder(tf.int32, [None, None])\n", - " self.Y_seq_len = tf.placeholder(tf.int32, [None])\n", - " self.Y = tf.sparse_placeholder(tf.int32)\n", - " seq_lens = tf.count_nonzero(\n", - " tf.reduce_sum(self.X, -1), 1, dtype = tf.int32\n", - " )\n", - " batch_size = tf.shape(self.X)[0]\n", - " \n", - " def cells(reuse = False):\n", - " return tf.contrib.rnn.DropoutWrapper(\n", - " tf.nn.rnn_cell.LSTMCell(\n", - " size_layers,\n", - " initializer = tf.orthogonal_initializer(),\n", - " reuse = reuse,\n", - " ),\n", - " state_keep_prob = dropout,\n", - " output_keep_prob = dropout,\n", - " )\n", - " def attention(encoder_out, seq_len, reuse=False):\n", - " attention_mechanism = tf.contrib.seq2seq.LuongAttention(num_units = size_layers, \n", - " memory = encoder_out,\n", - " memory_sequence_length = seq_len)\n", - " return tf.contrib.seq2seq.AttentionWrapper(\n", - " cell = tf.nn.rnn_cell.MultiRNNCell([cells(reuse) for _ in range(num_layers)]), \n", - " attention_mechanism = attention_mechanism,\n", - " attention_layer_size = size_layers)\n", - " \n", - " encoder_embedded = self.X\n", - " encoder_embedded = tf.layers.conv1d(encoder_embedded, size_layers, 1)\n", - " e = tf.identity(encoder_embedded)\n", - " for i in range(num_layers * 2):\n", - " z = layer(encoder_embedded, encoder_block, 3, size_layers * 2, encoder_embedded)\n", - " encoder_embedded = z\n", - " \n", - " encoder_output, output_memory = z, z + e\n", - " print(encoder_output, output_memory)\n", - " \n", - " init_state = tf.reduce_mean(output_memory,axis=1)\n", - " encoder_state = tuple(tf.nn.rnn_cell.LSTMStateTuple(c=init_state, h=init_state) for _ in range(num_layers))\n", - " main = tf.strided_slice(self.X, [0, 0, 0], [batch_size, -1, num_features], [1, 1, 1])\n", - " decoder_input = tf.concat([tf.fill([batch_size, 1, num_features], 0.0), main], 1)\n", - " decoder_cell = attention(encoder_output, seq_lens)\n", - " dense_layer = tf.layers.Dense(num_classes)\n", - " \n", - " training_helper = tf.contrib.seq2seq.TrainingHelper(\n", - " inputs = decoder_input,\n", - " sequence_length = seq_lens,\n", - " time_major = False)\n", - " training_decoder = tf.contrib.seq2seq.BasicDecoder(\n", - " cell = decoder_cell,\n", - " helper = training_helper,\n", - " initial_state = decoder_cell.zero_state(batch_size, tf.float32).clone(cell_state=encoder_state),\n", - " output_layer = dense_layer)\n", - " training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(\n", - " decoder = training_decoder,\n", - " impute_finished = True,\n", - " maximum_iterations = tf.reduce_max(seq_lens))\n", - " self.seq_lens = seq_lens\n", - " \n", - " logits = training_decoder_output.rnn_output\n", - " time_major = tf.transpose(logits, [1, 0, 2])\n", - " self.time_major = time_major\n", - " decoded, log_prob = tf.nn.ctc_beam_search_decoder(time_major, seq_lens)\n", - " decoded = tf.to_int32(decoded[0])\n", - " self.preds = tf.sparse.to_dense(decoded)\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.ctc_loss(\n", - " self.Y,\n", - " time_major,\n", - " seq_lens\n", - " )\n", - " )\n", - " self.optimizer = tf.train.AdamOptimizer(\n", - " learning_rate = learning_rate\n", - " ).minimize(self.cost)\n", - " \n", - " preds = self.preds[:, :tf.reduce_max(self.Y_seq_len)]\n", - " masks = tf.sequence_mask(self.Y_seq_len, tf.reduce_max(self.Y_seq_len), dtype=tf.float32)\n", - " preds = pad_second_dim(preds, tf.reduce_max(self.Y_seq_len))\n", - " y_t = tf.cast(preds, tf.int32)\n", - " self.prediction = tf.boolean_mask(y_t, masks)\n", - " mask_label = tf.boolean_mask(self.label, masks)\n", - " self.mask_label = mask_label\n", - " correct_pred = tf.equal(self.prediction, mask_label)\n", - " correct_index = tf.cast(correct_pred, tf.float32)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Tensor(\"add_3:0\", shape=(?, ?, 128), dtype=float32) Tensor(\"add_4:0\", shape=(?, ?, 128), dtype=float32)\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "\n", - "size_layers = 128\n", - "learning_rate = 1e-4\n", - "num_layers = 2\n", - "batch_size = 32\n", - "epoch = 50\n", - "\n", - "model = Model(num_layers, size_layers, learning_rate, inputs.shape[2])\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def pad_sentence_batch(sentence_batch, pad_int):\n", - " padded_seqs = []\n", - " seq_lens = []\n", - " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", - " for sentence in sentence_batch:\n", - " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", - " seq_lens.append(len(sentence))\n", - " return padded_seqs, seq_lens\n", - "\n", - "def sparse_tuple_from(sequences, dtype=np.int32):\n", - " indices = []\n", - " values = []\n", - "\n", - " for n, seq in enumerate(sequences):\n", - " indices.extend(zip([n] * len(seq), range(len(seq))))\n", - " values.extend(seq)\n", - "\n", - " indices = np.asarray(indices, dtype=np.int64)\n", - " values = np.asarray(values, dtype=dtype)\n", - " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", - "\n", - " return indices, values, shape" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "minibatch loop: 100%|██████████| 88/88 [00:54<00:00, 2.14it/s, accuracy=0.0519, cost=inf]\n", - "minibatch loop: 0%| | 0/88 [00:00:58: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv1D` instead.\n", + "W0830 22:13:45.878199 140654394951488 deprecation.py:506] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "W0830 22:13:47.976171 140654394951488 deprecation.py:323] From :65: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0830 22:13:48.255779 140654394951488 deprecation.py:323] From :68: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.cast` instead.\n", + "W0830 22:13:50.529386 140654394951488 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py:1354: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "\n", + "size_layers = 512\n", + "learning_rate = 1e-4\n", + "num_layers = 2\n", + "batch_size = 64\n", + "epoch = 20\n", + "\n", + "model = Model(num_layers, size_layers, learning_rate, dimension)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " train_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "test_X = tf.keras.preprocessing.sequence.pad_sequences(\n", + " test_X, dtype = 'float32', padding = 'post'\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "def pad_sentence_batch(sentence_batch, pad_int):\n", + " padded_seqs = []\n", + " seq_lens = []\n", + " max_sentence_len = max([len(sentence) for sentence in sentence_batch])\n", + " for sentence in sentence_batch:\n", + " padded_seqs.append(sentence + [pad_int] * (max_sentence_len - len(sentence)))\n", + " seq_lens.append(len(sentence))\n", + " return padded_seqs, seq_lens\n", + "\n", + "def sparse_tuple_from(sequences, dtype=np.int32):\n", + " indices = []\n", + " values = []\n", + "\n", + " for n, seq in enumerate(sequences):\n", + " indices.extend(zip([n] * len(seq), range(len(seq))))\n", + " values.extend(seq)\n", + "\n", + " indices = np.asarray(indices, dtype=np.int64)\n", + " values = np.asarray(values, dtype=dtype)\n", + " shape = np.asarray([len(sequences), np.asarray(indices).max(0)[1] + 1], dtype=np.int64)\n", + "\n", + " return indices, values, shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 206/206 [02:22<00:00, 1.82it/s, accuracy=0.741, cost=14.3]\n", + "testing minibatch loop: 100%|██████████| 9/9 [00:06<00:00, 1.44it/s, accuracy=0.726, cost=17.1]\n", + "minibatch loop: 0%| | 0/206 [00:00 0: + cp = np.pad(cp, (start, 0), mode = 'constant')[0 : cp.shape[0]] + else: + cp = np.pad(cp, (0, -start), mode = 'constant')[0 : cp.shape[0]] + return cp + + +with open('train-test.json') as fopen: + wavs = json.load(fopen)['train'] + +if not os.path.exists('augment'): + os.makedirs('augment') + +for no, wav in enumerate(wavs): + try: + root, ext = os.path.splitext(wav) + if (no + 1) % 100 == 0: + print(no + 1, root, ext) + root = root.replace('/', '<>') + root = '%s/%s'%('augment', root) + sample_rate, samples = scipy.io.wavfile.read(wav) + aug = change_pitch_speech(samples) + librosa.output.write_wav( + '%s-1%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + + aug = change_amplitude(samples) + librosa.output.write_wav( + '%s-2%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + + aug = add_noise(samples) + librosa.output.write_wav( + '%s-3%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + + aug = add_hpss(samples) + librosa.output.write_wav( + '%s-4%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + + aug = strech(samples) + librosa.output.write_wav( + '%s-5%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + + aug = random_augmentation(samples) + librosa.output.write_wav( + '%s-6%s' % (root, ext), + aug.astype('float32'), + sample_rate, + norm = True, + ) + except Exception as e: + print(e) + pass \ No newline at end of file diff --git a/speech-to-text/caching.ipynb b/speech-to-text/caching.ipynb new file mode 100644 index 0000000..97e5f57 --- /dev/null +++ b/speech-to-text/caching.ipynb @@ -0,0 +1,361 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import glob\n", + "import os\n", + "import numpy as np\n", + "from sklearn.model_selection import train_test_split\n", + "from tqdm import tqdm\n", + "import augmentation\n", + "import scipy\n", + "import librosa" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2240, 560)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wavs = glob.glob('data/*.wav')\n", + "train_X, test_X = train_test_split(wavs, test_size = 0.2)\n", + "len(train_X), len(test_X)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists('augment'):\n", + " os.makedirs('augment')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in tqdm(range(len(train_X))):\n", + " wav = train_X[i]\n", + " try:\n", + " root, ext = os.path.splitext(wav)\n", + " root = root.split('/')[1]\n", + " root = '%s/%s' % ('augment', root)\n", + " sample_rate, samples = scipy.io.wavfile.read(wav)\n", + " aug = augmentation.change_pitch_speech(samples)\n", + " librosa.output.write_wav(\n", + " '%s-1%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + " aug = augmentation.change_amplitude(samples)\n", + " librosa.output.write_wav(\n", + " '%s-2%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + "\n", + " aug = augmentation.add_noise(samples)\n", + " librosa.output.write_wav(\n", + " '%s-3%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + "\n", + " aug = augmentation.add_hpss(samples)\n", + " librosa.output.write_wav(\n", + " '%s-4%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + "\n", + " aug = augmentation.strech(samples)\n", + " librosa.output.write_wav(\n", + " '%s-5%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + "\n", + " aug = augmentation.random_augmentation(samples)\n", + " librosa.output.write_wav(\n", + " '%s-6%s' % (root, ext),\n", + " aug.astype('float32'),\n", + " sample_rate,\n", + " norm = True,\n", + " )\n", + " except:\n", + " pass\n" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "import soundfile\n", + "\n", + "sampling_rate = 22050\n", + "n_fft = 2048\n", + "frame_shift = 0.0125\n", + "frame_length = 0.05\n", + "hop_length = int(sampling_rate * frame_shift)\n", + "win_length = int(sampling_rate * frame_length)\n", + "n_mels = 80\n", + "reduction_factor = 5\n", + "\n", + "def compute_spectrogram_feature(\n", + " samples,\n", + " sample_rate = 16000,\n", + " stride_ms = 10.0,\n", + " window_ms = 20.0,\n", + " max_freq = None,\n", + " eps = 1e-14,\n", + "):\n", + " if max_freq is None:\n", + " max_freq = sample_rate / 2\n", + " if max_freq > sample_rate / 2:\n", + " raise ValueError(\n", + " 'max_freq must not be greater than half of sample rate.'\n", + " )\n", + "\n", + " if stride_ms > window_ms:\n", + " raise ValueError('Stride size must not be greater than window size.')\n", + "\n", + " stride_size = int(0.001 * sample_rate * stride_ms)\n", + " window_size = int(0.001 * sample_rate * window_ms)\n", + "\n", + " # Extract strided windows\n", + " truncate_size = (len(samples) - window_size) % stride_size\n", + " samples = samples[: len(samples) - truncate_size]\n", + " nshape = (window_size, (len(samples) - window_size) // stride_size + 1)\n", + " nstrides = (samples.strides[0], samples.strides[0] * stride_size)\n", + " windows = np.lib.stride_tricks.as_strided(\n", + " samples, shape = nshape, strides = nstrides\n", + " )\n", + " assert np.all(\n", + " windows[:, 1] == samples[stride_size : (stride_size + window_size)]\n", + " )\n", + "\n", + " # Window weighting, squared Fast Fourier Transform (fft), scaling\n", + " weighting = np.hanning(window_size)[:, None]\n", + " fft = np.fft.rfft(windows * weighting, axis = 0)\n", + " fft = np.absolute(fft)\n", + " fft = fft ** 2\n", + " scale = np.sum(weighting ** 2) * sample_rate\n", + " fft[1:-1, :] *= 2.0 / scale\n", + " fft[(0, -1), :] /= scale\n", + " # Prepare fft frequency list\n", + " freqs = float(sample_rate) / window_size * np.arange(fft.shape[0])\n", + "\n", + " # Compute spectrogram feature\n", + " ind = np.where(freqs <= max_freq)[0][-1] + 1\n", + " specgram = np.log(fft[:ind, :] + eps)\n", + " return np.transpose(specgram, (1, 0))\n", + "\n", + "\n", + "def get_spectrogram(fpath):\n", + " y, sr = librosa.load(fpath, sr = sampling_rate)\n", + " D = librosa.stft(\n", + " y = y, n_fft = n_fft, hop_length = hop_length, win_length = win_length\n", + " )\n", + " magnitude = np.abs(D)\n", + " power = magnitude ** 2\n", + " S = librosa.feature.melspectrogram(S = power, n_mels = n_mels)\n", + " return np.transpose(S.astype(np.float32))\n", + "\n", + "\n", + "def reduce_frames(x, r_factor):\n", + " T, C = x.shape\n", + " num_paddings = reduction_factor - (T % r_factor) if T % r_factor != 0 else 0\n", + " padded = np.pad(x, [[0, num_paddings], [0, 0]], 'constant')\n", + " return np.reshape(padded, (-1, C * r_factor))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(33, 400)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "spectrogram = get_spectrogram(wavs[0])\n", + "spectrogram = reduce_frames(spectrogram, reduction_factor)\n", + "spectrogram.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "augments = glob.glob('augment/*.wav')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[-19.36971185, -16.57376858, -18.71725863, ..., -23.41501521,\n", + " -23.68784775, -24.03949327],\n", + " [-15.61767618, -16.41841276, -19.80099433, ..., -20.95772327,\n", + " -21.39394458, -24.18905672],\n", + " [-15.02933384, -15.46920427, -19.5143906 , ..., -20.94875192,\n", + " -21.02911394, -21.28899505],\n", + " ...,\n", + " [-32.2361913 , -32.2361913 , -32.2361913 , ..., -32.2361913 ,\n", + " -32.2361913 , -32.2361913 ],\n", + " [-32.2361913 , -32.2361913 , -32.2361913 , ..., -32.2361913 ,\n", + " -32.2361913 , -32.2361913 ],\n", + " [-32.2361913 , -32.2361913 , -32.2361913 , ..., -32.2361913 ,\n", + " -32.2361913 , -32.2361913 ]])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data, _ = soundfile.read(augments[1])\n", + "spectrogram = compute_spectrogram_feature(data)\n", + "spectrogram" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "if not os.path.exists('spectrogram-train'):\n", + " os.mkdir('spectrogram-train')\n", + "\n", + "if not os.path.exists('spectrogram-test'):\n", + " os.mkdir('spectrogram-test')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 2240/2240 [02:26<00:00, 16.25it/s]\n", + "100%|██████████| 13642/13642 [15:26<00:00, 13.94it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "for i in tqdm(range(len(train_X))):\n", + " i = train_X[i]\n", + " loc = 'spectrogram-train/%s.npy'%(os.path.basename(i).split('.')[0])\n", + " \n", + " spectrogram = get_spectrogram(i)\n", + " spectrogram = reduce_frames(spectrogram, reduction_factor)\n", + " np.save(loc, spectrogram)\n", + " \n", + "for i in tqdm(range(len(augments))):\n", + " i = augments[i]\n", + " loc = 'spectrogram-train/%s.npy'%(os.path.basename(i).split('.')[0])\n", + " spectrogram = get_spectrogram(i)\n", + " spectrogram = reduce_frames(spectrogram, reduction_factor)\n", + " np.save(loc, spectrogram)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 560/560 [00:36<00:00, 15.41it/s]\n" + ] + } + ], + "source": [ + "for i in tqdm(range(len(test_X))):\n", + " i = test_X[i]\n", + " loc = 'spectrogram-test/%s.npy'%(os.path.basename(i).split('.')[0])\n", + " spectrogram = get_spectrogram(i)\n", + " spectrogram = reduce_frames(spectrogram, reduction_factor)\n", + " np.save(loc, spectrogram)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/wav2vec-preprocessing.ipynb b/speech-to-text/wav2vec-preprocessing.ipynb new file mode 100644 index 0000000..86e1119 --- /dev/null +++ b/speech-to-text/wav2vec-preprocessing.ipynb @@ -0,0 +1,139 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import librosa\n", + "import glob" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "train = glob.glob('spectrogram-train/*.npy')\n", + "x = []\n", + "for fpath in train:\n", + " fpath = fpath.split('/')[1]\n", + " splitted = fpath.split('-')\n", + " if len(splitted) == 2:\n", + " splitted[1] = splitted[1].split('.')[1]\n", + " fpath = splitted[0] + '.' + splitted[1]\n", + " fpath = fpath.replace('.npy','.wav')\n", + " x.append('data/' + fpath)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "16341" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "augment = glob.glob('augment/*.wav')\n", + "x.extend(augment)\n", + "x = list(set(x))\n", + "len(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "test_ = glob.glob('spectrogram-test/*.npy')\n", + "test = []\n", + "for t in test_:\n", + " f = t.split('/')[1].replace('.npy', '.wav')\n", + " test.append('data/'+f)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 16341/16341 [15:07<00:00, 18.01it/s]\n", + "100%|██████████| 560/560 [00:30<00:00, 18.51it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "X = []\n", + "for i in tqdm(range(len(x))):\n", + " y, sr = librosa.load(x[i], sr = 16000)\n", + " X.append(y)\n", + " \n", + "Y = []\n", + "for i in tqdm(range(len(test))):\n", + " y, sr = librosa.load(test[i], sr = 16000)\n", + " Y.append(y)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "with open('train-wav.pkl', 'wb') as fopen:\n", + " pickle.dump({'X': X, 'x': x}, fopen)\n", + " \n", + "with open('test-wav.pkl', 'wb') as fopen:\n", + " pickle.dump({'Y': Y, 'y': test}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/wav2vec-pytorch.ipynb b/speech-to-text/wav2vec-pytorch.ipynb new file mode 100644 index 0000000..0f6d8f0 --- /dev/null +++ b/speech-to-text/wav2vec-pytorch.ipynb @@ -0,0 +1,719 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "scrolled": false + }, + "outputs": [], + "source": [ + "import librosa\n", + "# import tensorflow as tf\n", + "import glob\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# follow hyperparameters from here, https://github.com/pytorch/fairseq/tree/master/examples/wav2vec\n", + "\n", + "features = [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)]\n", + "aggs = [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), \n", + " (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)]\n", + "num_negatives = 10\n", + "prediction_steps = 12\n", + "learning_rate = 1e-5\n", + "min_learning_rate = 1e-9\n", + "max_learning_rate = 0.005\n", + "learning_scheduler = 'cosine'\n", + "max_update = 400000\n", + "residual_scale = 0.5\n", + "log_compression = True\n", + "warmup_updates = 50\n", + "warmup_init_lr = 1e-07\n", + "batch_size = 32\n", + "epoch = 10\n", + "total_steps = batch_size * epoch" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from torch import nn\n", + "import torch.functional as F" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[[ 1.62434536, -0.61175641, -0.52817175, -1.07296862,\n", + " 0.86540763, -2.3015387 , 1.74481176],\n", + " [-0.7612069 , 0.3190391 , -0.24937038, 1.46210794,\n", + " -2.06014071, -0.3224172 , -0.38405435],\n", + " [ 1.13376944, -1.09989127, -0.17242821, -0.87785842,\n", + " 0.04221375, 0.58281521, -1.10061918],\n", + " [ 1.14472371, 0.90159072, 0.50249434, 0.90085595,\n", + " -0.68372786, -0.12289023, -0.93576943],\n", + " [-0.26788808, 0.53035547, -0.69166075, -0.39675353,\n", + " -0.6871727 , -0.84520564, -0.67124613],\n", + " [-0.0126646 , -1.11731035, 0.2344157 , 1.65980218,\n", + " 0.74204416, -0.19183555, -0.88762896],\n", + " [-0.74715829, 1.6924546 , 0.05080775, -0.63699565,\n", + " 0.19091548, 2.10025514, 0.12015895],\n", + " [ 0.61720311, 0.30017032, -0.35224985, -1.1425182 ,\n", + " -0.34934272, -0.20889423, 0.58662319],\n", + " [ 0.83898341, 0.93110208, 0.28558733, 0.88514116,\n", + " -0.75439794, 1.25286816, 0.51292982],\n", + " [-0.29809284, 0.48851815, -0.07557171, 1.13162939,\n", + " 1.51981682, 2.18557541, -1.39649634]],\n", + "\n", + " [[-1.44411381, -0.50446586, 0.16003707, 0.87616892,\n", + " 0.31563495, -2.02220122, -0.30620401],\n", + " [ 0.82797464, 0.23009474, 0.76201118, -0.22232814,\n", + " -0.20075807, 0.18656139, 0.41005165],\n", + " [ 0.19829972, 0.11900865, -0.67066229, 0.37756379,\n", + " 0.12182127, 1.12948391, 1.19891788],\n", + " [ 0.18515642, -0.37528495, -0.63873041, 0.42349435,\n", + " 0.07734007, -0.34385368, 0.04359686],\n", + " [-0.62000084, 0.69803203, -0.44712856, 1.2245077 ,\n", + " 0.40349164, 0.59357852, -1.09491185],\n", + " [ 0.16938243, 0.74055645, -0.9537006 , -0.26621851,\n", + " 0.03261455, -1.37311732, 0.31515939],\n", + " [ 0.84616065, -0.85951594, 0.35054598, -1.31228341,\n", + " -0.03869551, -1.61577235, 1.12141771],\n", + " [ 0.40890054, -0.02461696, -0.77516162, 1.27375593,\n", + " 1.96710175, -1.85798186, 1.23616403],\n", + " [ 1.62765075, 0.3380117 , -1.19926803, 0.86334532,\n", + " -0.1809203 , -0.60392063, -1.23005814],\n", + " [ 0.5505375 , 0.79280687, -0.62353073, 0.52057634,\n", + " -1.14434139, 0.80186103, 0.0465673 ]]])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.random.seed(1)\n", + "x = np.random.normal(size = (2, 10, 7))\n", + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([2, 10, 7])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = torch.from_numpy(x)\n", + "x.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 10, 7)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bsz, fsz, tsz = x.shape\n", + "bsz, fsz, tsz" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([10, 14])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y = x.transpose(0, 1)\n", + "y = y.contiguous().view(fsz, -1)\n", + "y.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "high = tsz\n", + "n_negatives = 10\n", + "# neg_idxs = torch.randint(low=0, high=high, size=(bsz, n_negatives * tsz))\n", + "# neg_idxs" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "neg_idxs = torch.from_numpy(np.array([[\n", + " 1, 2, 3, 1, 4, 0, 5, 6, 1, 2, 0, 4, 2, 1, 0, 5, 4, 5, 4, 6, 6, 4, 1, 6,\n", + " 6, 3, 4, 4, 5, 0, 1, 5, 4, 4, 1, 1, 0, 2, 0, 6, 2, 6, 3, 4, 5, 6, 2, 4,\n", + " 0, 2, 1, 2, 6, 4, 2, 4, 0, 2, 4, 2, 1, 0, 4, 6, 6, 4, 4, 2, 3, 4],\n", + " [4, 0, 3, 4, 2, 4, 4, 1, 0, 6, 3, 1, 5, 6, 4, 3, 6, 4, 0, 5, 1, 0, 4, 2,\n", + " 2, 0, 4, 1, 4, 3, 2, 2, 0, 4, 2, 3, 4, 6, 6, 2, 4, 0, 3, 1, 6, 2, 4, 5,\n", + " 1, 3, 1, 3, 3, 1, 3, 0, 3, 6, 0, 5, 2, 4, 5, 6, 0, 1, 2, 3, 6, 3]]))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([[ 1, 2, 3, 1, 4, 0, 5, 6, 1, 2, 0, 4, 2, 1, 0, 5, 4, 5,\n", + " 4, 6, 6, 4, 1, 6, 6, 3, 4, 4, 5, 0, 1, 5, 4, 4, 1, 1,\n", + " 0, 2, 0, 6, 2, 6, 3, 4, 5, 6, 2, 4, 0, 2, 1, 2, 6, 4,\n", + " 2, 4, 0, 2, 4, 2, 1, 0, 4, 6, 6, 4, 4, 2, 3, 4],\n", + " [11, 7, 10, 11, 9, 11, 11, 8, 7, 13, 10, 8, 12, 13, 11, 10, 13, 11,\n", + " 7, 12, 8, 7, 11, 9, 9, 7, 11, 8, 11, 10, 9, 9, 7, 11, 9, 10,\n", + " 11, 13, 13, 9, 11, 7, 10, 8, 13, 9, 11, 12, 8, 10, 8, 10, 10, 8,\n", + " 10, 7, 10, 13, 7, 12, 9, 11, 12, 13, 7, 8, 9, 10, 13, 10]])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "for i in range(1, bsz):\n", + " neg_idxs[i] += i * high\n", + " \n", + "neg_idxs" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([10, 140])" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "negs = y[..., neg_idxs.view(-1)]\n", + "negs.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([[-0.6118, -0.5282, -1.0730, ..., 0.8762, -0.3062, 0.8762],\n", + " [ 0.3190, -0.2494, 1.4621, ..., -0.2223, 0.4101, -0.2223],\n", + " [-1.0999, -0.1724, -0.8779, ..., 0.3776, 1.1989, 0.3776],\n", + " ...,\n", + " [ 0.3002, -0.3522, -1.1425, ..., 1.2738, 1.2362, 1.2738],\n", + " [ 0.9311, 0.2856, 0.8851, ..., 0.8633, -1.2301, 0.8633],\n", + " [ 0.4885, -0.0756, 1.1316, ..., 0.5206, 0.0466, 0.5206]],\n", + " dtype=torch.float64)" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "negs" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([10, 2, 10, 7])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "negs = negs.view(fsz, bsz, n_negatives, tsz).permute(2, 1, 0, 3)\n", + "negs.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([[[-0.6118, -0.5282, -1.0730, -0.6118, 0.8654, 1.6243, -2.3015],\n", + " [ 0.3190, -0.2494, 1.4621, 0.3190, -2.0601, -0.7612, -0.3224],\n", + " [-1.0999, -0.1724, -0.8779, -1.0999, 0.0422, 1.1338, 0.5828],\n", + " [ 0.9016, 0.5025, 0.9009, 0.9016, -0.6837, 1.1447, -0.1229],\n", + " [ 0.5304, -0.6917, -0.3968, 0.5304, -0.6872, -0.2679, -0.8452],\n", + " [-1.1173, 0.2344, 1.6598, -1.1173, 0.7420, -0.0127, -0.1918],\n", + " [ 1.6925, 0.0508, -0.6370, 1.6925, 0.1909, -0.7472, 2.1003],\n", + " [ 0.3002, -0.3522, -1.1425, 0.3002, -0.3493, 0.6172, -0.2089],\n", + " [ 0.9311, 0.2856, 0.8851, 0.9311, -0.7544, 0.8390, 1.2529],\n", + " [ 0.4885, -0.0756, 1.1316, 0.4885, 1.5198, -0.2981, 2.1856]],\n", + "\n", + " [[ 0.3156, -1.4441, 0.8762, 0.3156, 0.1600, 0.3156, 0.3156],\n", + " [-0.2008, 0.8280, -0.2223, -0.2008, 0.7620, -0.2008, -0.2008],\n", + " [ 0.1218, 0.1983, 0.3776, 0.1218, -0.6707, 0.1218, 0.1218],\n", + " [ 0.0773, 0.1852, 0.4235, 0.0773, -0.6387, 0.0773, 0.0773],\n", + " [ 0.4035, -0.6200, 1.2245, 0.4035, -0.4471, 0.4035, 0.4035],\n", + " [ 0.0326, 0.1694, -0.2662, 0.0326, -0.9537, 0.0326, 0.0326],\n", + " [-0.0387, 0.8462, -1.3123, -0.0387, 0.3505, -0.0387, -0.0387],\n", + " [ 1.9671, 0.4089, 1.2738, 1.9671, -0.7752, 1.9671, 1.9671],\n", + " [-0.1809, 1.6277, 0.8633, -0.1809, -1.1993, -0.1809, -0.1809],\n", + " [-1.1443, 0.5505, 0.5206, -1.1443, -0.6235, -1.1443, -1.1443]]],\n", + " dtype=torch.float64)" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "negs[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "torch.Size([1, 2, 10, 7]) torch.Size([10, 2, 10, 7])\n" + ] + }, + { + "data": { + "text/plain": [ + "torch.Size([11, 2, 10, 7])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y = x[:].unsqueeze(0)\n", + "print(y.shape, negs.shape)\n", + "targets = torch.cat([y, negs], dim=0)\n", + "targets.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([11, 2, 10, 7, 12])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "project_to_steps = nn.ConvTranspose2d(10, 10, (1, 12))\n", + "s = project_to_steps(x.unsqueeze(-1).float()).unsqueeze(0).expand(targets.size(0), -1, -1, -1, -1)\n", + "s.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "with open('convtranspose.pkl', 'wb') as fopen:\n", + " pickle.dump(s.detach().numpy().tolist(), fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n" + ] + } + ], + "source": [ + "import math\n", + "\n", + "jin = 0\n", + "rin = 0\n", + "for _, k, stride in features:\n", + " if rin == 0:\n", + " rin = k\n", + " rin = rin + (k - 1) * jin\n", + " if jin == 0:\n", + " jin = stride\n", + " else:\n", + " jin *= stride\n", + "offset = math.ceil(rin / jin)\n", + "\n", + "offset = int(offset)\n", + "print(offset)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(torch.Size([220]), torch.Size([220]))" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "copies, bsz, dim, tsz, steps = s.shape\n", + "steps = min(steps, tsz - offset)\n", + "predictions = s.new(bsz * copies * (tsz - offset + 1) * steps - ((steps + 1) * steps // 2) * copies * bsz)\n", + "labels = torch.zeros_like(predictions)\n", + "predictions.shape, labels.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(torch.Size([11, 2, 10, 7, 12]),\n", + " torch.Size([11, 2, 10, 7]),\n", + " torch.Size([11, 2, 10, 4]),\n", + " torch.Size([11, 2, 10, 4]))" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.shape, targets.shape, s[..., :-offset, i].shape, targets[..., offset:].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 8 88 3 torch.Size([11, 2, 10, 4]) torch.Size([11, 2, 10, 4])\n", + "tensor([0., 0., 0., 0., 0., 0., 0., 0.])\n", + "88 6 154 4 torch.Size([11, 2, 10, 3]) torch.Size([11, 2, 10, 3])\n", + "tensor([0., 0., 0., 0., 0., 0.])\n", + "154 2 176 6 torch.Size([11, 2, 10, 1]) torch.Size([11, 2, 10, 1])\n", + "tensor([0., 0.])\n", + "176 -4 132 9 torch.Size([11, 2, 10, 0]) torch.Size([11, 2, 10, 0])\n", + "tensor([])\n" + ] + }, + { + "data": { + "text/plain": [ + "tensor([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.,\n", + " 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0.])" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "start = end = 0\n", + "for i in range(steps):\n", + " offset = i + offset\n", + " end = start + (tsz - offset) * bsz * copies\n", + " pos_num = (end - start) // copies\n", + " print(start, pos_num, end, offset, s[..., :-offset, i].shape, targets[..., offset:].shape)\n", + " predictions[start:end] = (s[..., :-offset, i].float() * targets[..., offset:].float()).sum(dim=2).flatten()\n", + " print(labels[start:start + pos_num])\n", + " labels[start:start + pos_num] = 1.\n", + " start = end\n", + " \n", + "labels" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 3.66452038e-01, 3.81497800e-01, 1.17975384e-01, 3.86789769e-01,\n", + " -5.22472978e-01, 2.96876747e-02, 1.16604976e-02, -2.87122130e-01,\n", + " -5.71662426e-01, 3.81497800e-01, 1.39530897e-01, 4.34068859e-01,\n", + " -6.46445900e-04, 5.10506220e-02, 1.72045007e-01, -1.56508684e-02,\n", + " 3.22836012e-01, 3.81497800e-01, 1.98190331e-01, 6.12615168e-01,\n", + " -5.22472978e-01, 7.26154149e-02, 1.16604976e-02, -2.87122130e-01,\n", + " 1.93178654e-04, 3.81497800e-01, 3.89128476e-01, 3.86789769e-01,\n", + " -6.46445900e-04, -1.97719205e-02, 1.16604976e-02, 1.03846192e-01,\n", + " -3.02614093e-01, -7.46433139e-01, 4.23885345e-01, 6.11973643e-01,\n", + " 4.69868928e-01, -1.97719205e-02, 1.72045007e-01, 1.03846192e-01,\n", + " 1.93178654e-04, 3.81497800e-01, 4.23885345e-01, 6.12615168e-01,\n", + " 4.69868928e-01, -1.97719205e-02, 1.72045007e-01, -1.56631157e-01,\n", + " 3.22836012e-01, 1.09955943e+00, 1.98190331e-01, 3.86789769e-01,\n", + " 2.98553944e-01, 5.10506220e-02, 1.72045007e-01, -3.18063974e-01,\n", + " -3.02614093e-01, 4.66346890e-02, 4.23885345e-01, -5.15238345e-01,\n", + " 4.69868928e-01, 2.96876747e-02, 1.16604976e-02, 1.03846192e-01,\n", + " -3.02614093e-01, 3.81497800e-01, 1.98190331e-01, 6.11973643e-01,\n", + " -5.22472978e-01, 7.26154149e-02, 1.33439168e-01, -3.18063974e-01,\n", + " 3.25847238e-01, 1.02858454e-01, 1.39530897e-01, 6.11973643e-01,\n", + " -1.59006226e+00, 5.10506220e-02, 1.72045007e-01, 4.09733653e-01,\n", + " 3.02693009e-01, 4.66346890e-02, 1.06446333e-02, 6.11973643e-01,\n", + " 4.69868928e-01, -3.68873119e-01, 2.99941838e-01, -1.95966244e-01,\n", + " -4.13644135e-01, -1.29237294e-01, 1.60333395e-01, -3.52846593e-01,\n", + " -1.56483725e-01, 3.66783708e-01, -4.13644135e-01, -2.71202117e-01,\n", + " -1.61071479e-01, -3.36294562e-01, -2.24780828e-01, 2.57270455e-01,\n", + " -4.13644135e-01, -1.69466317e-01, -4.43083167e-01, 5.34787297e-01,\n", + " -1.56483725e-01, 3.66783708e-01, -4.13644135e-01, -1.87030524e-01,\n", + " 1.60333395e-01, -4.03155461e-02, -1.56483725e-01, -3.08349550e-01,\n", + " -6.05947077e-01, 2.67160714e-01, 7.26729155e-01, -4.03155461e-02,\n", + " -2.24780828e-01, -3.08349550e-01, -4.13644135e-01, 2.67160714e-01,\n", + " -4.43083167e-01, -4.03155461e-02, -2.24780828e-01, 2.38947958e-01,\n", + " 6.00431621e-01, -1.69466317e-01, 1.60333395e-01, -3.36294562e-01,\n", + " -2.24780828e-01, -3.04946840e-01, 1.82850763e-01, 2.67160714e-01,\n", + " -1.42082041e-02, -3.52846593e-01, -1.56483725e-01, -3.08349550e-01,\n", + " -4.13644135e-01, -1.69466317e-01, 7.26729155e-01, 5.34787297e-01,\n", + " 4.26046550e-02, -3.04946840e-01, 7.43078351e-01, -2.71202117e-01,\n", + " 7.26729155e-01, -3.36294562e-01, -2.24780828e-01, -4.00821835e-01,\n", + " 1.82850763e-01, 4.71313477e-01, 7.26729155e-01, 1.44188344e-01,\n", + " -3.04346308e-02, -3.80576074e-01, 3.49486321e-01, 1.49221234e-02,\n", + " -1.40055329e-01, -1.21881872e-01, 8.19714814e-02, 1.49221234e-02,\n", + " 3.49486321e-01, -1.40668884e-01, 2.77836770e-01, -1.40668884e-01,\n", + " 8.19714814e-02, 1.61352471e-01, 3.49486321e-01, 1.23742744e-02,\n", + " 4.80696619e-01, -1.40668884e-01, 2.77836770e-01, 1.23742744e-02,\n", + " 2.77836770e-01, -3.08042616e-02, 2.77836770e-01, -2.98715889e-01],\n", + " dtype=float32)" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "predictions.detach().numpy()[:-4 * 11]" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "# torch.nn.functional.binary_cross_entropy(predictions, labels)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/torch/nn/functional.py:1351: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.\n", + " warnings.warn(\"nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.\")\n" + ] + }, + { + "data": { + "text/plain": [ + "tensor([0.5906, 0.5942, 0.5295, 0.5955, 0.3723, 0.5074, 0.5029, 0.4287, 0.3609,\n", + " 0.5942, 0.5348, 0.6068, 0.4998, 0.5128, 0.5429, 0.4961, 0.5800, 0.5942,\n", + " 0.5494, 0.6485, 0.3723, 0.5181, 0.5029, 0.4287, 0.5000, 0.5942, 0.5961,\n", + " 0.5955, 0.4998, 0.4951, 0.5029, 0.5259, 0.4249, 0.3216, 0.6044, 0.6484,\n", + " 0.6154, 0.4951, 0.5429, 0.5259, 0.5000, 0.5942, 0.6044, 0.6485, 0.6154,\n", + " 0.4951, 0.5429, 0.4609, 0.5800, 0.7502, 0.5494, 0.5955, 0.5741, 0.5128,\n", + " 0.5429, 0.4211, 0.4249, 0.5117, 0.6044, 0.3740, 0.6154, 0.5074, 0.5029,\n", + " 0.5259, 0.4249, 0.5942, 0.5494, 0.6484, 0.3723, 0.5181, 0.5333, 0.4211,\n", + " 0.5807, 0.5257, 0.5348, 0.6484, 0.1694, 0.5128, 0.5429, 0.6010, 0.5751,\n", + " 0.5117, 0.5027, 0.6484, 0.6154, 0.4088, 0.5744, 0.4512, 0.3980, 0.4677,\n", + " 0.5400, 0.4127, 0.4610, 0.5907, 0.3980, 0.4326, 0.4598, 0.4167, 0.4440,\n", + " 0.5640, 0.3980, 0.4577, 0.3910, 0.6306, 0.4610, 0.5907, 0.3980, 0.4534,\n", + " 0.5400, 0.4899, 0.4610, 0.4235, 0.3530, 0.5664, 0.6741, 0.4899, 0.4440,\n", + " 0.4235, 0.3980, 0.5664, 0.3910, 0.4899, 0.4440, 0.5595, 0.6458, 0.4577,\n", + " 0.5400, 0.4167, 0.4440, 0.4243, 0.5456, 0.5664, 0.4964, 0.4127, 0.4610,\n", + " 0.4235, 0.3980, 0.4577, 0.6741, 0.6306, 0.5106, 0.4243, 0.6777, 0.4326,\n", + " 0.6741, 0.4167, 0.4440, 0.4011, 0.5456, 0.6157, 0.6741, 0.5360, 0.4924,\n", + " 0.4060, 0.5865, 0.5037, 0.4650, 0.4696, 0.5205, 0.5037, 0.5865, 0.4649,\n", + " 0.5690, 0.4649, 0.5205, 0.5403, 0.5865, 0.5031, 0.6179, 0.4649, 0.5690,\n", + " 0.5031, 0.5690, 0.4923, 0.5690, 0.4259, 0.5061, 0.5000, 0.5000, 0.0000,\n", + " 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 1.0000,\n", + " 0.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.9986, 1.0000,\n", + " 0.5000, 0.5000, 0.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,\n", + " 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.3149, 0.5000, 1.0000, 1.0000,\n", + " 0.5000, 0.0000, 0.5000, 0.5000], grad_fn=)" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "torch.nn.functional.sigmoid(predictions)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/wav2vec-tf.ipynb b/speech-to-text/wav2vec-tf.ipynb new file mode 100644 index 0000000..acb0ccb --- /dev/null +++ b/speech-to-text/wav2vec-tf.ipynb @@ -0,0 +1,518 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import librosa\n", + "import tensorflow as tf\n", + "import glob\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# follow hyperparameters from here, https://github.com/pytorch/fairseq/tree/master/examples/wav2vec\n", + "\n", + "features = [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)]\n", + "aggs = [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), \n", + " (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)]\n", + "num_negatives = 10\n", + "prediction_steps = 12\n", + "learning_rate = 1e-5\n", + "min_learning_rate = 1e-9\n", + "max_learning_rate = 0.005\n", + "learning_scheduler = 'cosine'\n", + "max_update = 400000\n", + "residual_scale = 0.5\n", + "log_compression = True\n", + "warmup_updates = 50\n", + "warmup_init_lr = 1e-07\n", + "batch_size = 32\n", + "epoch = 10\n", + "total_steps = batch_size * epoch" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "tf.compat.v1.enable_eager_execution()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 7, 10)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.random.seed(1)\n", + "\n", + "# 2 batch, 10 dimension, 7 t\n", + "x = np.transpose(np.random.normal(size = (2, 10, 7)), (0, 2, 1))\n", + "x.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "TensorShape([Dimension(10), Dimension(2), Dimension(10), Dimension(7)])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def negative_sample(y):\n", + " bsz = tf.shape(y)[0]\n", + " fsz = tf.shape(y)[1]\n", + " tsz = tf.shape(y)[2]\n", + " \n", + " # b, d, t -> d, b, t\n", + " y = tf.transpose(y, [1, 0, 2])\n", + " y = tf.reshape(y, (fsz, -1))\n", + " # neg_idxs = tf.random_uniform((bsz, num_negatives * tsz), minval=0, maxval=tsz, dtype=tf.int32)\n", + " \n", + " neg_idxs = np.array([[\n", + " 1, 2, 3, 1, 4, 0, 5, 6, 1, 2, 0, 4, 2, 1, 0, 5, 4, 5, 4, 6, 6, 4, 1, 6,\n", + " 6, 3, 4, 4, 5, 0, 1, 5, 4, 4, 1, 1, 0, 2, 0, 6, 2, 6, 3, 4, 5, 6, 2, 4,\n", + " 0, 2, 1, 2, 6, 4, 2, 4, 0, 2, 4, 2, 1, 0, 4, 6, 6, 4, 4, 2, 3, 4],\n", + " [4, 0, 3, 4, 2, 4, 4, 1, 0, 6, 3, 1, 5, 6, 4, 3, 6, 4, 0, 5, 1, 0, 4, 2,\n", + " 2, 0, 4, 1, 4, 3, 2, 2, 0, 4, 2, 3, 4, 6, 6, 2, 4, 0, 3, 1, 6, 2, 4, 5,\n", + " 1, 3, 1, 3, 3, 1, 3, 0, 3, 6, 0, 5, 2, 4, 5, 6, 0, 1, 2, 3, 6, 3]])\n", + " \n", + " ranged = tf.expand_dims(tf.range(1, bsz), axis = 1)\n", + " a = tf.add(neg_idxs[1:bsz], tf.tile(ranged, [1, num_negatives * tsz]) * tsz)\n", + " \n", + " neg_idxs = tf.concat([neg_idxs[:1], a, neg_idxs[bsz:]], axis = 0)\n", + " neg_idxs = tf.reshape(neg_idxs, [-1])\n", + " negs = tf.gather(y, neg_idxs, axis=1)\n", + " negs = tf.reshape(negs, (fsz, bsz, num_negatives, tsz))\n", + " negs = tf.transpose(negs, [2, 1, 0, 3])\n", + " return negs\n", + "\n", + "# b, t, d -> b, d, t\n", + "y = tf.transpose(x.copy(), (0, 2, 1))\n", + "neg = negative_sample(y)\n", + "neg.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "TensorShape([Dimension(11), Dimension(2), Dimension(10), Dimension(7)])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "targets = tf.concat([tf.expand_dims(y, axis = 0), neg], axis = 0)\n", + "targets.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(2, 7, 10)\n", + "WARNING:tensorflow:From :5: conv2d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv2DTranspose` instead.\n" + ] + }, + { + "data": { + "text/plain": [ + "TensorShape([Dimension(11), Dimension(2), Dimension(7), Dimension(10), Dimension(12)])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "b = tf.shape(targets)[0]\n", + "print(x.shape)\n", + "x_ = tf.expand_dims(x, axis = -1)\n", + "\n", + "x_ = tf.layers.conv2d_transpose(x_, prediction_steps, (1, 1))\n", + "x_ = tf.expand_dims(x_, axis = 0) \n", + "x_ = tf.tile(x_, [b, 1, 1, 1, 1])\n", + "x_.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(11, 2, 10, 7, 12)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pickle\n", + "\n", + "with open('convtranspose.pkl', 'rb') as fopen:\n", + " x_ = np.array(pickle.load(fopen))\n", + " \n", + "x_.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import math\n", + "\n", + "jin = 0\n", + "rin = 0\n", + "for _, k, stride in features:\n", + " if rin == 0:\n", + " rin = k\n", + " rin = rin + (k - 1) * jin\n", + " if jin == 0:\n", + " jin = stride\n", + " else:\n", + " jin *= stride\n", + "offset = math.ceil(rin / jin)\n", + "\n", + "offset = int(offset)\n", + "offset" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(TensorShape([Dimension(220)]), TensorShape([Dimension(220)]))" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "copies = tf.shape(x_)[0]\n", + "bsz = tf.shape(x_)[1]\n", + "dim = tf.shape(x_)[2]\n", + "tsz = tf.shape(x_)[3]\n", + "steps = tf.shape(x_)[4]\n", + "\n", + "steps = tf.math.minimum(steps, tsz - offset)\n", + "predictions = tf.zeros(bsz * copies * (tsz - offset + 1) * \\\n", + " steps - ((steps + 1) * steps // 2) * copies * bsz)\n", + "labels = tf.zeros_like(predictions)\n", + "predictions.shape, labels.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "x_ = tf.cast(x_, tf.float32)\n", + "targets = tf.cast(targets, tf.float32)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "TensorShape([Dimension(11), Dimension(2), Dimension(10), Dimension(7), Dimension(12)])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 tf.Tensor(8, shape=(), dtype=int32) tf.Tensor(88, shape=(), dtype=int32) 3 (11, 2, 10, 4) (11, 2, 10, 4)\n", + "tf.Tensor(88, shape=(), dtype=int32) tf.Tensor(6, shape=(), dtype=int32) tf.Tensor(154, shape=(), dtype=int32) 4 (11, 2, 10, 3) (11, 2, 10, 3)\n", + "tf.Tensor(154, shape=(), dtype=int32) tf.Tensor(2, shape=(), dtype=int32) tf.Tensor(176, shape=(), dtype=int32) 6 (11, 2, 10, 1) (11, 2, 10, 1)\n", + "tf.Tensor(176, shape=(), dtype=int32) tf.Tensor(-4, shape=(), dtype=int32) tf.Tensor(132, shape=(), dtype=int32) 9 (11, 2, 10, 0) (11, 2, 10, 0)\n" + ] + } + ], + "source": [ + "def body(i, start, end, predictions, labels, offset):\n", + " offset = i + offset\n", + " end = start + (tsz - offset) * bsz * copies\n", + " pos_num = (end - start) // copies\n", + " print(start, pos_num, end, offset, x_[:, :, :, :-offset, i].shape, targets[:, :, :, offset:].shape)\n", + " s = tf.reduce_sum((x_[:, :, :, :-offset, i] * targets[:, :, :, offset:]), axis = 2)\n", + " s = tf.reshape(s, [-1])\n", + " s = tf.pad(s, [[start, tf.shape(predictions)[0] - (start + tf.shape(s)[0])]])\n", + " predictions = tf.add(predictions, s)\n", + " pos_num = pos_num if pos_num > 0 else 0\n", + " l = tf.ones((pos_num))\n", + " l = tf.pad(l, [[start, tf.shape(labels)[0] - (start + pos_num)]])\n", + " labels = tf.add(labels, l)\n", + " return i + 1, end, end, predictions, labels, offset\n", + "\n", + "def condition(i, start, end, predictions, labels, offset):\n", + " return i < steps\n", + "\n", + "ranged = tf.Variable(tf.constant(0))\n", + "_, _, _, predictions, labels, _ = tf.while_loop(condition, body, [0, 0, 0, predictions, labels, offset])" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n", + " 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " dtype=float32)" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.array(labels)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "label_weights = tf.abs(tf.sign(predictions))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "log_probs = tf.math.log_sigmoid(predictions)\n", + "per_example_loss = -1 * (log_probs * labels)\n", + "numerator = tf.reduce_sum(label_weights * per_example_loss)\n", + "denominator = tf.reduce_sum(label_weights) + 1e-5\n", + "numerator / denominator" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "numerator = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels,\n", + " logits=predictions) * label_weights\n", + "numerator = tf.reduce_sum(numerator)\n", + "denominator = tf.reduce_sum(label_weights) + 1e-5\n", + "numerator / denominator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/speech-to-text/wav2vec.ipynb b/speech-to-text/wav2vec.ipynb new file mode 100644 index 0000000..8c0ae97 --- /dev/null +++ b/speech-to-text/wav2vec.ipynb @@ -0,0 +1,1112 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import librosa\n", + "import tensorflow as tf\n", + "import glob\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "with open('train-wav.pkl', 'rb') as fopen:\n", + " X = pickle.load(fopen)['X']\n", + " \n", + "with open('test-wav.pkl', 'rb') as fopen:\n", + " Y = pickle.load(fopen)['Y']" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# follow hyperparameters from here, https://github.com/pytorch/fairseq/tree/master/examples/wav2vec\n", + "\n", + "features = [(512, 10, 5), (512, 8, 4), (512, 4, 2), (512, 4, 2), (512, 4, 2), (512, 1, 1), (512, 1, 1)]\n", + "aggs = [(512, 2, 1), (512, 3, 1), (512, 4, 1), (512, 5, 1), (512, 6, 1), (512, 7, 1), (512, 8, 1), (512, 9, 1), \n", + " (512, 10, 1), (512, 11, 1), (512, 12, 1), (512, 13, 1)]\n", + "num_negatives = 10\n", + "prediction_steps = 12\n", + "learning_rate = 1e-5\n", + "min_learning_rate = 1e-9\n", + "max_learning_rate = 0.005\n", + "learning_scheduler = 'cosine'\n", + "max_update = 400000\n", + "residual_scale = 0.5\n", + "log_compression = True\n", + "warmup_updates = 50\n", + "warmup_init_lr = 1e-7\n", + "batch_size = 24\n", + "epoch = 10\n", + "total_steps = batch_size * epoch" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# tf.compat.v1.enable_eager_execution()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import math\n", + "import re\n", + "\n", + "def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps):\n", + " \"\"\"Creates an optimizer training op.\"\"\"\n", + " global_step = tf.train.get_or_create_global_step()\n", + " learning_rate = tf.constant(value = init_lr, shape = [], dtype = tf.float32)\n", + " learning_rate = tf.train.polynomial_decay(\n", + " learning_rate,\n", + " global_step,\n", + " num_train_steps,\n", + " end_learning_rate = 0.0,\n", + " power = 1.0,\n", + " cycle = False,\n", + " )\n", + "\n", + " if num_warmup_steps:\n", + " global_steps_int = tf.cast(global_step, tf.int32)\n", + " warmup_steps_int = tf.constant(num_warmup_steps, dtype = tf.int32)\n", + "\n", + " global_steps_float = tf.cast(global_steps_int, tf.float32)\n", + " warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)\n", + "\n", + " warmup_percent_done = global_steps_float / warmup_steps_float\n", + " warmup_learning_rate = init_lr * warmup_percent_done\n", + "\n", + " is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)\n", + " learning_rate = (\n", + " 1.0 - is_warmup\n", + " ) * learning_rate + is_warmup * warmup_learning_rate\n", + " \n", + "# optimizer = tf.train.RMSPropOptimizer(learning_rate)\n", + "# optimizer = tf.train.AdamOptimizer(learning_rate)\n", + "\n", + " optimizer = AdamWeightDecayOptimizer(\n", + " learning_rate = learning_rate,\n", + " weight_decay_rate = 0.01,\n", + " beta_1 = 0.9,\n", + " beta_2 = 0.999,\n", + " epsilon = 1e-6,\n", + " exclude_from_weight_decay = ['LayerNorm', 'layer_norm', 'bias'],\n", + " )\n", + "\n", + " tvars = tf.trainable_variables()\n", + " grads = tf.gradients(loss, tvars)\n", + "\n", + " # This is how the model was pre-trained.\n", + " # (grads, _) = tf.clip_by_global_norm(grads, clip_norm = 1.0)\n", + "\n", + " train_op = optimizer.apply_gradients(\n", + " zip(grads, tvars), global_step = global_step\n", + " )\n", + "\n", + " new_global_step = global_step + 1\n", + " train_op = tf.group(train_op, [global_step.assign(new_global_step)])\n", + " return train_op\n", + "\n", + "\n", + "class AdamWeightDecayOptimizer(tf.train.Optimizer):\n", + " \"\"\"A basic Adam optimizer that includes \"correct\" L2 weight decay.\"\"\"\n", + "\n", + " def __init__(\n", + " self,\n", + " learning_rate,\n", + " weight_decay_rate = 0.0,\n", + " beta_1 = 0.9,\n", + " beta_2 = 0.999,\n", + " epsilon = 1e-6,\n", + " exclude_from_weight_decay = None,\n", + " name = 'AdamWeightDecayOptimizer',\n", + " ):\n", + " \"\"\"Constructs a AdamWeightDecayOptimizer.\"\"\"\n", + " super(AdamWeightDecayOptimizer, self).__init__(False, name)\n", + "\n", + " self.learning_rate = learning_rate\n", + " self.weight_decay_rate = weight_decay_rate\n", + " self.beta_1 = beta_1\n", + " self.beta_2 = beta_2\n", + " self.epsilon = epsilon\n", + " self.exclude_from_weight_decay = exclude_from_weight_decay\n", + "\n", + " def apply_gradients(self, grads_and_vars, global_step = None, name = None):\n", + " \"\"\"See base class.\"\"\"\n", + " assignments = []\n", + " for (grad, param) in grads_and_vars:\n", + " if grad is None or param is None:\n", + " continue\n", + "\n", + " param_name = self._get_variable_name(param.name)\n", + "\n", + " m = tf.get_variable(\n", + " name = param_name + '/adam_m',\n", + " shape = param.shape.as_list(),\n", + " dtype = tf.float32,\n", + " trainable = False,\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " v = tf.get_variable(\n", + " name = param_name + '/adam_v',\n", + " shape = param.shape.as_list(),\n", + " dtype = tf.float32,\n", + " trainable = False,\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " next_m = tf.multiply(self.beta_1, m) + tf.multiply(\n", + " 1.0 - self.beta_1, grad\n", + " )\n", + " next_v = tf.multiply(self.beta_2, v) + tf.multiply(\n", + " 1.0 - self.beta_2, tf.square(grad)\n", + " )\n", + "\n", + " update = next_m / (tf.sqrt(next_v) + self.epsilon)\n", + " if self._do_use_weight_decay(param_name):\n", + " update += self.weight_decay_rate * param\n", + "\n", + " update_with_lr = self.learning_rate * update\n", + "\n", + " next_param = param - update_with_lr\n", + "\n", + " assignments.extend(\n", + " [param.assign(next_param), m.assign(next_m), v.assign(next_v)]\n", + " )\n", + " return tf.group(*assignments, name = name)\n", + "\n", + " def _do_use_weight_decay(self, param_name):\n", + " \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n", + " if not self.weight_decay_rate:\n", + " return False\n", + " if self.exclude_from_weight_decay:\n", + " for r in self.exclude_from_weight_decay:\n", + " if re.search(r, param_name) is not None:\n", + " return False\n", + " return True\n", + "\n", + " def _get_variable_name(self, param_name):\n", + " \"\"\"Get the variable name from the tensor name.\"\"\"\n", + " m = re.match('^(.*):\\\\d+$', param_name)\n", + " if m is not None:\n", + " param_name = m.group(1)\n", + " return param_name\n", + "\n", + "def gelu(x):\n", + " cdf = 0.5 * (1.0 + tf.tanh(\n", + " (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))\n", + " return x * cdf\n", + "\n", + "def layer_norm(input_tensor, name=None):\n", + " return tf.contrib.layers.layer_norm(\n", + " inputs=input_tensor, begin_norm_axis=1, begin_params_axis=-1, scope=name)\n", + "\n", + "\n", + "def cnn_block(x, hidden_dim, kernel_size, strides):\n", + " x = tf.layers.conv1d(inputs = x,\n", + " filters = hidden_dim,\n", + " kernel_size = kernel_size,\n", + " strides = strides)\n", + " \n", + " x = layer_norm(x)\n", + " # x = gelu(x)\n", + " x = tf.nn.relu6(x)\n", + " return x\n", + "\n", + "def cnn_aggregator(x, hidden_dim, kernel_size, strides):\n", + " ka = kernel_size // 2\n", + " kb = ka - 1 if kernel_size % 2 == 0 else ka\n", + " pad = tf.zeros([tf.shape(x)[0], kb + ka, hidden_dim])\n", + " x = tf.layers.conv1d(inputs = tf.concat([pad, x], 1),\n", + " filters = hidden_dim,\n", + " kernel_size = kernel_size,\n", + " strides = strides)\n", + " \n", + " x = layer_norm(x)\n", + " # x = gelu(x)\n", + " x = tf.nn.relu6(x)\n", + " return x\n", + "\n", + "def negative_sample(y):\n", + " bsz = tf.shape(y)[0]\n", + " fsz = tf.shape(y)[1]\n", + " tsz = tf.shape(y)[2]\n", + " \n", + " # b, d, t -> d, b, t\n", + " y = tf.transpose(y, [1, 0, 2])\n", + " y = tf.reshape(y, (fsz, -1))\n", + " neg_idxs = tf.random_uniform((bsz, num_negatives * tsz), minval=0, maxval=tsz, dtype=tf.int32)\n", + " \n", + " ranged = tf.expand_dims(tf.range(1, bsz), axis = 1)\n", + " a = tf.add(neg_idxs[1:bsz], tf.tile(ranged, [1, num_negatives * tsz]) * tsz)\n", + " \n", + " neg_idxs = tf.concat([neg_idxs[:1], a, neg_idxs[bsz:]], axis = 0)\n", + " neg_idxs = tf.reshape(neg_idxs, [-1])\n", + " negs = tf.gather(y, neg_idxs, axis=1)\n", + " negs = tf.reshape(negs, (fsz, bsz, num_negatives, tsz))\n", + " negs = tf.transpose(negs, [2, 1, 0, 3])\n", + " return negs\n", + "\n", + "class Model:\n", + " def __init__(self):\n", + " self.X = tf.placeholder(tf.float32, (None, None))\n", + " feature = tf.expand_dims(self.X, axis = 2)\n", + " \n", + " for no, f in enumerate(features):\n", + " size_layers = f[0]\n", + " kernel_size = f[1]\n", + " strides = f[2]\n", + " with tf.variable_scope('feature_%d'%no):\n", + " feature = cnn_block(feature, size_layers, kernel_size, strides)\n", + " \n", + " if log_compression:\n", + " feature = tf.math.abs(feature)\n", + " feature = feature + 1\n", + " feature = tf.math.log(feature)\n", + " \n", + " x = tf.identity(feature)\n", + " self.targets = tf.identity(feature)\n", + " for no, f in enumerate(aggs):\n", + " size_layers = f[0]\n", + " kernel_size = f[1]\n", + " strides = f[2]\n", + " with tf.variable_scope('agg_%d'%no):\n", + " x = cnn_aggregator(x, size_layers, kernel_size, strides)\n", + " \n", + " jin = 0\n", + " rin = 0\n", + " for _, k, stride in features:\n", + " if rin == 0:\n", + " rin = k\n", + " rin = rin + (k - 1) * jin\n", + " if jin == 0:\n", + " jin = stride\n", + " else:\n", + " jin *= stride\n", + " offset = math.ceil(rin / jin)\n", + "\n", + " offset = int(offset)\n", + " \n", + " self.logits = x\n", + " transpose_targets = tf.transpose(self.targets, (0, 2, 1))\n", + " self.negatives = negative_sample(transpose_targets)\n", + " \n", + " y = tf.expand_dims(transpose_targets, axis = 0)\n", + " targets = tf.concat([y, self.negatives], axis = 0)\n", + " b = tf.shape(targets)[0]\n", + "\n", + " x = tf.expand_dims(tf.transpose(self.logits, (0, 2, 1)), axis = -1)\n", + " x = tf.layers.conv2d_transpose(x, prediction_steps, (1, 1))\n", + " x = tf.expand_dims(x, axis = 0) \n", + " x = tf.tile(x, [b, 1, 1, 1, 1])\n", + " \n", + " copies = tf.shape(x)[0]\n", + " bsz = tf.shape(x)[1]\n", + " dim = tf.shape(x)[2]\n", + " tsz = tf.shape(x)[3]\n", + " steps = tf.shape(x)[4]\n", + " \n", + " steps = tf.math.minimum(steps, tsz - offset)\n", + " predictions = tf.zeros(bsz * copies * (tsz - offset + 1) * \\\n", + " steps - ((steps + 1) * steps // 2) * copies * bsz)\n", + " labels = tf.zeros_like(predictions)\n", + " \n", + " def body(i, start, end, predictions, labels, offset):\n", + " offset = i + offset\n", + " end = start + (tsz - offset) * bsz * copies\n", + " pos_num = (end - start) // copies\n", + " s = tf.reduce_sum((x[:, :, :, :-offset, i] * targets[:, :, :, offset:]), axis = 2)\n", + " s = tf.reshape(s, [-1])\n", + " s = tf.pad(s, [[start, tf.shape(predictions)[0] - (start + tf.shape(s)[0])]])\n", + " predictions = tf.add(predictions, s)\n", + " pos_num = tf.cond(pos_num > 0, lambda: pos_num, lambda: 0)\n", + " l = tf.ones((pos_num))\n", + " l = tf.pad(l, [[start, tf.shape(labels)[0] - (start + pos_num)]])\n", + " labels = tf.add(labels, l)\n", + " return i + 1, end, end, predictions, labels, offset\n", + "\n", + " def condition(i, start, end, predictions, labels, offset):\n", + " return i < steps\n", + "\n", + " ranged = tf.Variable(tf.constant(0))\n", + " _, _, _, predictions, labels, _ = tf.while_loop(condition, body, [0, 0, 0, predictions, labels, offset])\n", + " self.predictions = predictions\n", + " self.labels = labels\n", + " \n", + " label_weights = tf.abs(tf.sign(self.predictions))\n", + " \n", + " numerator = tf.nn.sigmoid_cross_entropy_with_logits(\n", + " labels=self.labels,\n", + " logits=self.predictions) * label_weights\n", + " numerator = tf.reduce_sum(numerator)\n", + " denominator = tf.reduce_sum(label_weights) + 1e-5\n", + " self.cost = numerator / denominator\n", + " print(self.cost)\n", + " self.optimizer = create_optimizer(self.cost, learning_rate, total_steps, warmup_updates)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py:1735: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :246: conv2d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.keras.layers.Conv2DTranspose` instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AttributeError: module 'gast' has no attribute 'Num'\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "Tensor(\"truediv:0\", shape=(), dtype=float32)\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Deprecated in favor of operator or tf.math.divide.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((1, 161, 512), (1, 161, 512))" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_x = X[:1]\n", + "logits, targets, neg = sess.run([model.logits, model.targets, model.negatives], feed_dict = {model.X: batch_x})\n", + "logits.shape, targets.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAxwAAAEyCAYAAABwPtZnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9d5hcx3mn+1adczrH6Z48gwnIgQRJBAJQehStZNOWkySnXa8tW/eurLvXXlu21/Z1uLvSrsPaXlu2bMtBu7ZWwbIkk7qSmCRRAEgAJHIGBjODyblz9wl1/zg9PTMIIikiDMl6nwdPn1Cnqk43ps75qr7v9wmlFBqNRqPRaDQajUZzO5B3uwMajUaj0Wg0Go3mlYs2ODQajUaj0Wg0Gs1tQxscGo1Go9FoNBqN5rahDQ6NRqPRaDQajUZz29AGh0aj0Wg0Go1Go7ltaINDo9FoNBqNRqPR3DZui8EhhHi7EOKcEOKiEOIjt6MNjUaj+W7Q45NGo9FoNHcWcavzcAghDOA88FbgKnAIeJ9S6vQtbUij0WheJHp80mg0Go3mznM7Vjh2AxeVUpeVUjXg08BDt6EdjUajebHo8Umj0Wg0mjuMeRvq7ASGl+1fBR68tpAQ4gPABwAMjB0REiAABcKyULZdL1g/FgqC61JpD2LlQTgKkS8t1oVSChEKoipVhJQoz1vZnmmiHGfpQDwC+RLCNMEyUeXKiuMqHkGWqyjHBcBLRQCBUaqhaja19ijBGRsMuXQtIAIBVK3mbxsGGAYohXJsWFxMEgJe4MqSMAzsdAhrvgKKxn15yQhyvgiAikcQ+RLCNBr9BcgzN62Uan5BDWk0rw6ed3y64dh0q3kRYwBcMyZ+p3KG4Y9n1dpN6xdSghD+WFIvc7P6hWWibOe640sFWBrXXgR6bNIAZLNZ1dvbe7e7odFobiFHjhy54fh+OwyOF4RS6hPAJwASokk9KN7snxCAAzIawSuVlo5V6xcOg/vGBzCeeJa5f7uX9N8dWFlm8QEo/MPGxnWIXAFnbBwEGFs3glK4p8/7ZVzw9t6HfOoowgqgCjVkLIpsaUZNz0JrFvfCZViAqZ/fS9tjE7gXLmOG1lB8ayvBRw412pIRv89GcwZ3egZjbb9ftrsLZ/gqCJChEF6l0rhGWAGUXfMNItf16ygWG99T9e27/DaA6rt2EX7ytH9+wb9Hed8Wapkw5mNHwAUjmYC2ZtTYJF/L/e3grf7dNJpXOjcdm14KiwaGNMCrTwqIF3G98wLLeyyNlTcrr1g2+fE89b+Qdl/gfSyOdQjBo95n9dikobe3l8OHD9/tbmg0mluIEOKG4/vtMDhGgO5l+131Y8+LME1kJIKbyy0ZG3WMjeuYvy9L+ulReOJZALL/chr3RhUtwz13ceX+qXMgDcy2VgiHcAYGkU8dBfAfhoBXLOJdLmKk076xUaf5Lw402nOuDLHwri5altW92Gd3esb/vDjglx2+2iizImZGGg1jAyFBOXjFIkYiQXXXeszHjjSMDYDgw4cQqeSS0QJ4R0+v+BHdXA5yuef5VjSaVy3f9fj0klj+d3/N6oaRzVC5rxfr0SM3vFRu2wQSvONn/fKJBO6mHnjmBN7r7icwNE1xSxuRwQW8aNAvMzmPNzWD6OtGCUGtJUpoYBo3FUOWqpT600QGF6BmN8Ypcf9W1HOnGtuYEnXoBMbGdTA9izszi5FpQhVL/viz5148U2LNlvDCFsZ0DlUsIaIRVCiIk42hTIE1nkdZJk4qhDW+gAoF4cTt+JI1Go1Gs1q5HQbHIWC9EKIP/0H+XuD93+kCYRqYa3qgZoPnIdZ344YtqtkAsqoILNRY6AojHcXpj7QhIlmymTzW3zUxdb+k60mbgR8SRC9bJC97LPRLuh7NcemH4qTPQPOjg6hkjPzGNKGpKoHReUrrs8xuDiDcTpQJTghiI4rouE0tYWBHJInBCnPrNjG/ReEmXPo+qxCOYvT1QZywov/zec7/7Q5SR4LM32PT/YhgYpfB2n+axUmEkI5HJRsienKM6roWAiMLoBSiXEVVa4hICLWQQ0SjqHwBEjHfPaslw8BPQM9HeshVghSOZ3DWVEjuDyE8yPeAlReUuh2ajhnMbndpe0oSv1LGvDAK1SoEgzB5G35djeblzYsen24p3vVTJO70DOGTJouOS2ZnB87IKEamCXtLD8bYPKJqIzJNzLxzI8nLZabuixBatwfpwEJ/F+FZF5TCmM5BuYJKJ5CJOJ4hkPkygWlwBgYxuzpxro4QPAeidw3O0AhIAxmwcKIW5n1bcKMB5HPnce9bD/u2w1wJe/MaSh0biA8UmdwZo+0bM7BQprSpCSdmERorYHc2YQ15qFgE98xFrFI73uwcbrGIDIWQlQre4iqHRqPRaF5V3HKVKgAhxDuB/w4YwCeVUv/vdyqfEE3qQeNtN3wYL6v05j7JwSCqWr3u+OKD+8VipNO4c3NL9fT14AwMYjQ3405NXV8+kfBXFpYfSyVx5xdWHss04c7M+m4VcNP7XXwpWMFyV4zl1L8XI5vx3b9OnWucelR97ohSaud3uleN5tXGixmfbplLlWYFemzSAOzcuVNplyqN5pWFEOKG4/ttieFQSj0CPPJCywtD+rEHAcufnQcQkuoD/YQGZlD5Im5fG+X2MNaHxrl8sY3USZPiawvEvhEluKCQtmJij6DtgCLfaRCe8VBCYFbXkDw5i5MKY5RslCnBVdQyIaSjKHQGcAMCLwAoaP32LIX1ScLjFTzLwI6bFMOS4js7yR4vM/5T61EGdH5sP6UfeJC5DQZmCRDgBkDa0PHELF7AxCi14oUt8MBJBTGvzOBtXkPg0ngjsBxpQLWKcl1EIAABC1UsMfjb+xAKfuw9j/Ote0Oc/4sHQMGmjxcZ35ci36cITwpqCYV0fSfq5qMOUWsLolhBFErw4m0tjeYVz4sdn245yydP6hMJ14pamO1tftwZILdvRgyM4OZyGK0t5F7bR3DOITg8x+TrW2k+PI+SEuF5iHINLxnBDRpYEzlqnSkCxwbw1nYir4whYlFUMAAzcw3XT6O1BXdiErVvO9bILG5LCnXohO962t1B/r52YmdnqXYlQYE1V0G4LhP70rQeXEAUK7gXLmNs2UC1LY4blkSu5KBad1FNRZEDo4hwGAIWzsDgdxVortFoNJqXL3ctaHwF0kCkkyjLxG6O4cQsKk0G4Umb+Z1tCFdhVjyGv8/jofQoP/z6Ixy6r4/Tc62M3x/EWDBxox5btwxxKtmJGamSmw9A0CMwahGajhOcKJDfmGLyAYnwQNYEwVko9PpPPrMk8AxF7V0ZUhdcZrdEsWOC2IjL1H2S73nHIY7OdFEYsDAKktxX1jI1V0UMhdn07gs8+9xattw7xOnBduzjEZQUzOxLYOUVbgjsmKDNa8KcKVNb1441No+wHV9VKxpBBAOofAEhBKSTpHZN8rN9T3Gp0sJvXn6Wf/epfbghxeUfTCIdiKyfp9QZpLt5jpHDHbhdFThq4cSDyJCJcZd/Uo3mVcc1geEyHsfL55HxOMKQSyueQgIeRjzuq+uZZmNF1exdg3NlqGFsAHjHzgBgrOtDxcJIVxGcKKBGxmn9Wo3q2hY8UxAaySFyBWprUgRmKqiRcYwLl/FME1m2oTWLnQojSzaGHcdIJRCVGpgGxub1MFfCGRyGwWFkPI5MxFGlCmbZw4uHMMoO5mQOFQqS25yk/aujoBSTb+qkeWoGRicwT58nEImgDAPZlKKyrgWzaONs7MacK4GU/sRK5dovT6PRaDSvZG6LS9WLRbst3B6024JG89J43rFJXCPRJCQyYPly2K6Lcr26JHZdftY0fUOkUEQm47jTM4hgEKMpvcLIeKWyuJqixyYNaJcqjeaVyB11qXqxCMNAWEFkKom9vgOzUEMUytS60gSODaC6W5FT88y8uZfUTw9z8bluvIyNORogOCtIDrhM32sQnAP39Qt0/z8ul96fxigLuh4rEbg6C65HfkcHxVYDs+y3mzkwweTrWyl2CXoeyVNt8hVegpNlhOdR7ooxu8micE8VMWdBcxXrUthvJwQocMOQvOBRbpbkNjn0f8YlMFmk3BPHLLoIpRC2hzldoLS2iei5KbzRcZTjIFNJVLnif+YLACjHQXS1c+7/aEYJ2L3zPE+fWEfylIm0FZWMQDpQbvPwkg7hRIXI/xcnNuLgRCRWwcXK2ZjTeT+fskajuSMIw/AV56T0xSGiEaRloUpllFLIaAS3qxk7bOFEDJQpcELriF/KM7cpQb67n+K2CswHEE1VDMvFND0q41E++rZPE5VVQsLmRKWbJ6c3slALMTiW4d6eEU6NtKOUwDBd1EAUO+FiFA2CcwIUlHocMD0QEDsXQEmwYwo76SFSNYShSMZLNEeLJAIV4maV92QP8x8O/Sg/ufVpFpwwpxba+f62o1wotzJdi7EmPMvB6T7OX2xHRh1iR8J4JggPQtMKJcGJCkIzfrvBBT8GrdCxFj7xubv8a2k0Go3mTrIqDA4EyP41iLkcwlWUumMIJ8rcBou2J+eoPbiO0vY00zsUuxOT2NsNZosRsn/t4vzWHBNPdqKkori7DOUAl94XInkePFNRSwdQVhZZ8zALLuVtJtFRhXDhyo+2EZyH6FXFuZ8Lgi1pes7AyppEx2wW+iwqzYpQrEpZQX/bDMWmAADlmkXTx6Ns+N1TfP3QveB4RK+YXPk+ybpPB/zEhI5Hrj9MJSOITISJjNu4TTEMO4sqFBGW5fs1O44/IyoFwjJBCKyOIrWxKH/U/SWGO4L8WO4XUFL592UJ9u47wX3xIf77v76bzPuGydWCRH4/hR03sAoCFQ7c5R9Vo3kVsGyFWDnOyuSi10h7e/k8jE8ggcW/ziB+6oz4UUhl/fw9DZWqZSIVf9/5GlSpDG1Z3LMXMTbECV0dY31xkDKwfksSJqZRNdt344pEEAELTNPPCZTNIILBlSIaNxCiUPgpfhaAP7Xupc8+xlNWoq4sNco/B7vrAh15Jtu6YHKMzekSIhYFx8UZGW2IeAgrAMpb+Z3U71mj0Wg0ry60S9UrGO22oNG8NF7q2LSU6HMpUHx5sk+jvQ0MiSpXcCen/JWRegC52dWJqtZQnc2IqxPU7ukheH4cVbNxp6aWDJOtG2F6DpUv+LmAbmBIGOk07vw8CInZ0YY3PbPCQFpu3JjtbTjjExjZLN78AsquNZTzjLpRBL46oNjUj3fszHVB7w2EwFjXB9NziHQS5/IVQI9NGh/tUqXRvPJY1S5VGo1G87Jm0aC41rBYfAlXqh67UUDZNYx02jc08nloyaJiEcyAv+7hpWMIoJYMI586Sum1/VgtEUIjeXIPdlOLSqxyP9JRqN3dhMcrkGjHnClAKoIyJOZsEScTxTx9BZFO4SUiiLWdyFwZ5/wl2H0PsmQjDIGczeOOjSN2bqPSEkYt2JjRMG42Ti3Vi7VQg4kFam/fhZWrsfDODYSnHeyYQerwONW37ADADRtEz0zB5Ay0ZlEj4/73kSug1rQhFoq+4tbQOMzejR9Jo9GsNmqOx5HBOfauzdztrmhuM6vS4FhUagEwu7tWZOq+UX6L52Nx9u1G+TK+G8oP7Sb8xWdeUh2Ls5wyFALAq1YRhnH9LKEQyO2b8Y6eBkDet6WxrdFoVgmLK8XLXayuSXDn5fON7eV5fphfwGhtwZmY9F/Qh/06ZP105AtP+9cA8ckb5wICYMsGOHwa4bm4gKhfw7LxsrHu8cwJvMV+LR47fJIgIONx3HweLkK4njvIAQL11YnUfr94EHAAq37cWl7/8nG2WISJyaV2NBrNqx6lFP/87AiPn53k4RNjPPwLr2VrR/Jud0tzG1kVBocIWBjRJJgmIh7Dm56l9vZdRM6M+9lrAWyb6bf2k+8VpM95jH9fldDpME5U0f6UQ/T0OCoY4NwHm+n+mosTlUzslHQ9YRN66ozvCtDdgjx9mckfv5fmZ3LM3JcgczSHLFZwz1+i/NBuYufnsFtiVDIBIqNlqk1BBh+CDT93iEv/bS9djzss9JpUf2IvxU5B50f3c/4vdxG9bFFu99j00QHczizi/BAylcRLxvBiAcyRWdxsElm1cc9eQiZiiHAYb24emUrC3DwiHoNqFREKcf7/7scNe6z/0NOc/8UgiYP7aPmz/ShDsPBjeyh2SuyYIjYERhXCMw6RQ1caLyMyHoeXbltpNJrn4zskJX2huBOT/ka9nmuTj8p4HBGwKD/QQ+RsCBWLIEoVsJ1GklD39HmM1hYoV/Cq1ZXJUBcThDY3Q3MaNTzmx3pEo3jF4ooyXj5P+ft3E/6XZ3BnZjHbWv1YkOYUxtg0zvgEMhRC9HWjro7jbF+LUXEQNQfv+Nkl9y2lEFYAYUi8SmVlWxqNZlXgefUJDimep+St5XNHrvIfP3e8sT9fsu9o+5o7z8smhsNY14d7ceCWt33tisniysO1FH/wQaKff/oF1dnw277Z+UgEr1Squ1zYz/uyYmzduCKD+AtF+0lrNC8NHV92e9BjkwZ0DMdq4N1/+i1OjuS48tF33dF2//Dr5/mTxy409v/XzzzIa9Zl72gfNLeHVR3DIaQEYWAkYiCkr3DS24V7+jzy3k3IfBmnbmw4b96B+diRxrY1X2FuS4LUpw5g9vdSXdPEQl+QxFCN0OVpP6utNJDRyAqXBiObwettxz18cqkfVgDZlALAnZppBF5e/the+n/lQKNc+aHdRC/lYHCEyms2UYtLEhfyS65OUrIcI5vBnZnFiMfxqlW8UgkjlUS5HtQEMhzCK5Z9VRnXBcPg8q/dS2wIsn95APfUOfI/uodazJ+BSF+soIRgZluI9senYWYeZ10H1tg8zsAgMhxG2Q5cbzdpNJpbRT0HR0MO1zD8pHYBy/80JMo0fAU6wEuEwfOoZsPYMf9YOSvxDEE1DcoA6YAXgNA0KAHlNkXmuC8xW00J0udryKqLdWEUqtXndS81Ozt8qe1QELsrQy1hASBtj3KzhVlR1OKSpqNzXPn+JrInXQptBsJTNJ2rUmoNkP7WEF46gcwXcUfGkL3deFeGbxwkfu1XVHdnFTu3UUsFyfUFyPcAv65lcTWau43jepwc0a4QmjvDy2aFQ/Pi0bOIGs1LQ49Ntwc9NmlAr3Dcbb50bJRf+KfnAPjHn3mQfXdwheHaFY43b2rhb/7NrjvWvub2cbMVDnmjwqsds6vzu7rOSCT86/t6bmV3/Lq3bPDr7ux4/sLSWLl/bbbiF8HtuBeNRvMikYtJ/wwQAmGaflbxUMhfOY3HAV9KVsbjGIkERiqJ2dmBsXFdoxrnTTsQO7ZitLZQfecu5L2bfEWrZRjNzYhd98Due/Bed3+jvQZCIO7f2si9sTheLo5/jWLWslw9e+7FWNeH2d/rx4Esv7W6sIW8bwv2W3b449WyMWyxHrOvB2PrRmQodF0di22vaFOj0dxVas6SlMNffevyXewJPHZ2kvGFm7uia17+rAqXquu4JghTxuMNdyijtaURJNkofo0G/KJm/LUsKlS5w9efe6m4p/203isSa92MazTyX0rAqTMw+F1fq9FobhGei1omw7Q4HjU+63FhwjQRhkTVbEQwiJfLw8ysb5zEovD4EdzX3Y8olQlNlpGzeVRzE879/Uw+EGTN/7zE4E+tJTypqDQLAvMKsWEP2WdzDPxwgvgViEx5GBWP0vYNpC6UoWIj2tKUMyHCR4fwuluopYIoQ2DlbGTNQQHC9ShsbUW4CrPchXAVbthgdmMA4UJ0wmXk3Q7mW/YQHRXYUUgMepSzkszJKvlmCyckaAqYVJvDmPl2hFLIQg1RKEEuh9HVjptNwDPapWo1IoR4O/DHgAH8tVLqo9ec/yPgjfXdCNCilErVz7nAifq5IaXU992ZXmteKSjuvseN5vaxagyOxgxYxp/Nc1MxZKWGFwshp3PIRBy3Lc25H4+z6WMCZ3wCgPOf2EXf5xS5HoumkyXmNkeITLoEepsJjMz58rpKIaNRWLcGMeKrwVS391JqtagmBMkrDnZUEr+UZ25bgsRABWtoGm9mFhGwsO/px4maOBGJUfYodJm0fn0E58oQI7+yDzcIVhGCswqzqkg/M4EancArlbDfsgPr0SMNeV+zqxOVy6PqajCLWXnBn/1UtoMRi3L1Z7dRTSvav+1Q+dAcyXdeXPF9uW98gKGfc+h773Fm/t1eKllBx1MljOOX/HqtgI7h0GjuNtfk51ih0nRNJvLFeAz5red8CdnDJ1mcRjHPQ8fjvgxt58cmrmtGAX3PrTwWXHYO/OzmLsDEpK/8d815BwhdM4EhgdavLe1v+Pz1t7i4bhJfVt/y+j2WpHedgUHE8Kp57GiWIYQwgD8D3gpcBQ4JIb6klGrosCul/sOy8h8C7l9WRVkpdd+d6q/m1vLEuZvIbd8mHj7+AiZnNa8oVs3I38iPsUy/fXEdoLF2MTLKuiPL9oENHzgEwGLKmKal2O4V5bxiEY6daeybj82w3MEggP9QTB695tqS/wIQqJeBJf15gM6P7b/+XpZtW4/WA9zruUSuXXlZLl25uO3m87T/wVK9wa9c1wTGE8/S94S/nfmbpZtefLDfSGlLo9HcQhaDxk3/9VoYEiwLPA8RDvnB4p5CxCKoSAgv4A+3wvOotkaRNf+vtdgRwKgpim2+m1L2aIm5zWFcSxCa88j1STq+WeLSjwRJnjUotStanvVwA4JA3iXXY1Jqh+7HKuS7gkgH7IggPOdSi0rMisINClD+8VK7wA0rwuN+//N9Ht2Puf75mIFwFdJVhKZqFLpCBBdc8p0mdlyQGHIpZyTVlMANwJpHFpi7J0Fo1kXaHqGRAsJ1qXYkCA7Po8IBatkI1bSJkoJimySQU/C3n77Tv5bm+dkNXFRKXQYQQnwaeAi4WeKn9wG/dYf6pnmFcWnqeonsTx0Y5Jffvuku9EZzJ1gVBoewLMw1vQAUNzUTGcyB5zH52izJyzWqKROjqhjfY7Bm71W2psY4ONHL/OFmnKjC6CoRDNq4rsQ0PAKmiyE9pk9naTuoiF/M4wUMqpkQCPAsQanZoNwsKK2tERiziIwJEFDJQPvBGvP9FrExF7PoIj4yxeXLrbxnxxEe+/s9tH9znrP/ZxQZcmA6iLQFXnuFlmwO+59byBwrNNoqtJs0nS1TaQ5g5V2CVxfwIkGMsWkIBlCV+uqGlP52wEIt5Ki+fhsf+NPP8+sP/yhr7hkjbNqcudyBKBpk+ueYnYsSiVWpnUrihhTCFTQ/q4gPFJFVBzk5B3oCQaO5fdRdIReNe+UIqNm+y2SptJRxfD6ADIeQlgmui0inCJgSWaiigibJhSqyVCX5VB4Vi+C0JklequFEDOyoJHnJwzo5wDqvF3O6gBoawatUMNtaccYnaEskUI6D2rqWzJMT2D3NGIUqwnYR+RLe3DzCMqGzDTGfR0VCkCsgQkGcwWHa6zk+aMlQ2OCvMMuawrMkqaPTMD2LUe0leGUGtylG4tk8Kl9AhMOgFJmvTuGsbUfsP4ZIJcEwCAwMQzKBLAcIzeQIFkuIRIxkJISo6MmQVUonMLxs/yrw4I0KCiF6gD7g8WWHQ0KIw/jzcR9VSv3LTa79APABgDVr1tyCbmteKfz5k5e0wfEKZlUYHMq2cS5fQYZCBC9faawQZM74CgaLnez9V//zDFD+cAs9f3z96sJykvhuSB5+4GOwUmnkwAg/T5+al+88ChsY4kymidaZ/XjAhp+F2vfsJPDVg9dceQlYcmdY/Fxs77pVm2tYdLGyvnaYv93Uyzp1sHHd5s1R3DMXMNvbaBo7f9O+L3dh0Gg0dwilQLlL24uH7Rru8hXH+QUYWLkSunxbXPTHPBMILTtvnhxAZJsY++kHCOQVlYwgPtyL8EBJ8ExBaqFEsSuMHYkQnnGJDBiQiuMFDWSujErFYWIaEfSNDbFzG+LKGKSTICXRr59EuS6qWsXYsoHpvc1Ex1JYOZvRd3bS+nSO/H1tWHmX0OVpRt/ZiVlWOBFBeM0ejJrCyjkEp0p4UlJaEyU8UcGctPCmZlDjtZUJCTUvV94LfE4ptfy/bo9SakQI0Q88LoQ4oZS6dO2FSqlPAJ8AX6XqznRXcyPubKo/zaud1aFSVXdN8J7nQWS2tWJsXg9A2zXGhtnfe3353qXZk8VEfN41ftM3bau97bpj7szs0s7uewh89TBGNtM4JO/b0tgWweBK5ZhrkJFIvaBYKm8FGg9j5807VgaT77mX8Tf4knUqEfvO9UkDEQxeV0aj0dwhlivRCbHi79Fsa8XsXYORSiIjkcbYZSQSGNkMRiKB2dON99r7MNb14bx5B/n37mH6PVsp92fIHi8hHUXHk/MA2FGBkhDIuTiZGMFZm+ZvjhEeK1FeE8eNBxG2S6UnjZMMoTqaUY6D97r7MWYLuP0dMJ+n1JOg+NZteDs3U/qBB6m2xfFM381KBSTRcReZKxOcs5GOR62ridioSzDn0fp0ntQTlwnO2QTmqyhTgimJjJaRFQc3GUVm0kvjlGY1MgJ0L9vvqh+7Ee8F/mn5AaXUSP3zMvAkK+M7NKsQbe1p7iSrYoVDCIEMhsCy8PJ5jEQCkUljt6cw58vYTRHckIEtBYUOC+djEts1qJ5J0vPVCtP3hDGqiuKPd5C66DGzTSAdiIxCeLaD5BOXQAqElHiZFIX1SQCMmsfsRovkgEstLkmdLzK5M0br03lqQQM2tFNpChAZK3PhgyahCyESVzzft3rBQ3U/iHQV09s2kj3hkO82Kbx3L/2/9gyyuwMVC+OkQgSuzlFemyE4WcaYnEPVA0eNQBJVs8EwULW6io0VQCbjzPQGKP36Pswy5Nc7bP7Vc7QW2im/axcomP6Bfaz50jSX3peh74t5ODOAjEQQvV3+C06uCEN36xfVaF5lSAM8tyH8AL663uKkg4iE/b91KfBam/CCJjIZxW4KY81VkNs340QCVFqCxPYPoCwTa6oAQDVt4pkwtwWq6SDpczbBBZdiT4zE2XmK/UmKrQbjewWZYxGi4zYze9swK4rE6VmwHRYeaKUWFQRzJm4gTDQTYX5dkEBnG6FZh+ruXoSjkApKbUGsgkupzSKY85MOVjIBAnmXcm1LBzIAACAASURBVF+auY0BpK0IzisCBY/pTSauFSUhWrFm6hM6UlJLBlCmoNhqERupgSEQqRj2jn74ulapWoUcAtYLIfrwDY33Au+/tpAQYhOQBg4sO5YGSkqpqhAiC7wG+K93pNcajeZlwaowOJTn4VUqCKV8v+Q1LZgjM4j9x2DrRgIjcxQ3tRCcq5L+2mFGs/vo+P39lN7zIPP9IVr+x37Mrk5m3tBN4h8PNoLBjeZm3Kkp3x2hswNnbALZkiZxbILixmak7dH+h/5KifGOXRizRVr+x3FkNkN571pCX36G0BvuJ9cXYfNvjeJcGUKYJjM/sYvZjQY9f3WOqz+5ESsP8eMThL84SNv2zRAOQb4IV8cItrfiDAwSKVdRxSLO/AIyFMKr2RixKKpW86Uz6y8sRioJrVliYw6ZvznEpT/Yw4YPPuO7XMwvYDY/gPHEs0Qze3FSYdqecbn0I3HW/UM3slDCOX1zVyuNRnObqEtdL3cXWpTyBmBubml7wlfK8/C1RxfdHwW+66ULMLWkGBO9OABA8gbNukDolO96lVl2PLDsPECsXsdyMk/c+FYW12JuljGj9asr97u+TCPPyPJ7Xny4pOqfiusVrDSrB6WUI4T498BX8f9rflIpdUoI8TvAYaXUl+pF3wt8Wq3MGrwZ+EshhIfvOfHR5epWGs0rkYlchXjIJBJYFa/Sq55XRabx5Xk87gYr2r8mx8jtQFgBlF3T2Xw1mpeIzjR+e9BjkwZ0pvG7zeeOXOWXPnussX/lo++6I+16nqL/1x654bk71YdbQe9HHuberiRf+vevvdtdWVW8ojKNv1juprFxXft3wMDTkrgajUaj0Wi+E/mKfVfa/dcTY3el3dvB8asLd7sLLxtWxTqQMAxkKAJC4BWLmG2tIAS1de14AYlRcnCiJtJRXP4hi+xhSdOxHJd/JEHfrx5g5mf2YpUUyfMFxJkB7F0bufRjBj1fgOAjhzD7eqBcgUgYb2oG9961CNul0Bsl9pmDzP/kXlL/cACxcxvq8EmMjesQxTLze7uIf+EI5bc/gBMWxC8WGH9tEqGg7VtzuJEA1sA4zvgERiqJu5BrJBkUnW2okXFEJIJqz+Akw1iTeUSu0JDCVbUaSIkIhfDqSb+E5f8kXqnE8H/aR++brzD8cC/dD/suFipoMbUjSeZvDjD3U3vJPnwed3oGwHfH6mhFLBR8N63xu/BjajSa7xp57ya842cxNq7DPXfxuvNmXw9OPTmf2dmBM3Jj7WsjkVjKbXQbMNvbcMa+wwBzB1ZyNRrNS+O3v3x3vN7KtZvpdGpeyawKg0O57gr1qMUs4nJsvLEEs+j3u/7J+jUsZdbN/PWBxjEFGE8+y4Ynl+p3rsmeK77tZ/eLPePvp/6hfv3hkwCNB33ssyMoIPSvzzTqb6236eH7XC/+2SxmCYZ6ksHzdTXAYhGmppCslL5cwbIVkOWrE92/tx/396CD0RXXZurJCdN/f2CltOb8gi+5qdFoVhfLXsAXpa9vWGxoDCPT1BiDZDyOVyz58V1bNjB3TxPpcsVPMOg41N6+i+Cjz/k5MXo6YXAEYUi8vi7kuQHY0AsXBhE9najBEURnGxjST8x3dQHv0hWU4yBDIWRzluK2diIHLyKScXA9VD6PqlRxH9iIcewiMtuENzuPMzZO+aHdxA5eQUQjuCNjyLU9MDaFSMZR5QrCslC2jVuPWdFoNJrno1xzCQeM5y+oedmxKgyOVxpmd1cjs/i13PBlox4wft2nRqN5eSEEwrQQhkS5HkgBnkK5LsIQyGQCL1dARiLIthZQCjWfQ6QSeNEwVKqU1mfJ9WwkNKeIDZWZuj9KYsjBjknCkzZeNo2TCaOkQNY85n90J9ExG6PsINetAc9DFsqIZALXEIj1PVAoU9uzCTtmUGoxCOQUnpnC3fwARkVhVFycsEHo8eNM/NQDBBcUtbggcaXGzNYgRlVhrb2XWkLQ/qkFjC0bCM7UsNd3EBidx2hK4yRCkOhGVhxEwMLOxjDPDWNsXAezC7hTU8hIBLWxD57VKlUazWrj1OgCWztuJE9x5/jBj+/nkQ+/7q72QXN7WBUGhxACI5tBJOJ4o+PIdApnbJzKu3cTOzmOCliU+5sIH7nCmd/tY9MvnuLKL20nOAsIqKag/x+uMvaOTgJ5xcQ+j94vethxg8TpObwLVxCb+/GOn8V+206srx3G7OnG7mgi1x8mfSrH+GtTdHzhCgt7uokOlzAGxnGnpmD3PYy+Pk6xx0WkanR9xiI4W6PQHSKw4DKx26LzGxWMbx6j9NBOCh0GLX+235e5TIQwyjZyYBR3ZhbvDfcjv/Gc7/pkGCAEqlpDVap+pl/DQFWrKMdh/P/aR+enznL2NzdgFQTJuvhU8lIZa67M0LszBBb8hFuFtS5Nz0oyp4rwzClkwELEojD1Hb92jUZzqxAChEQYBihfqEcE/HVZ5ThguyjPz+UjTBMMA+fyFYzWFlS1irtsFTZ4cWBF4tGWA2C0tjRWChZlgBZZ/nqwuMrbYGy8sW9eHMCEmyY9NerXZj9xYMXxtkdXlnMBTucafWg4R9RdrBZVt+RlE9dxYGZ2ScWqVILnTt2kBxqN5m7yy587zsO/cHdf9k+P3T5X0FvJbFHHyr5YVoXBoZTy4xBmZkEpvLFx5PbNhP71mcbDLHD+Ei6w4een8IDkRY/k/1rK8l343t00f9x/UCb/Jxjr+wlcuNxwORIDfv4i62u+IoYzOIwcGSN1CDzHoeWo/+CMft73iXapr1Q8c4KW2A7M3z8C+AkGnctXiNfrXfMVYPc94LlEvvA0EQBp4B074z+QzfpDF5Df8P2xlrtfLa1qBBv5OWQoRHBOceFXNrL+w/49GVs2wNQc3vwCrl2j85pn9nKfb6/iQj3RoUajuc3UjQ3w3UMBVKXiJ+CUwjc4VpSXePm8P9tfKMK6NchLw3jFIsaWDXiRAMbYrB/X1deNd/LsbXdLqnzvbmJHR2+6MvvdsPy+77Zwh0aj0dxKfvWfj9/tLrzsWBUGBwKMTJMfPN2cwjt6GpkvIzavB0+BFKighTIEgx+RtKVyVL4kmfj4buIXTNoOFim2GtTet4dih6Tjm3nGHowT2NOKVfaIfu5pRCSM9DxERyuVnjShoXmcbAxrZBZncBjvDfdjHDyNun8jRq6Ce/o8bjZJ/jVdzK+TGDv3UYsr1v6DH1+i9m5nblOElidGGNsRI9a+GzsiSfzTQVCen6k8HEIt5HBnZrHftpPQoUu4c3MY6TReqYQIBMC2EYEoWCbKMFC2g1KKqb0ukUET5007GH8wSNd/2Q977mX0desRHkTGPYqdkuCMovXxUfKbmwif82dCqVZxc4XvEDSi0WhuGUoBnv+5zPgQUvifpomMRVG2gwiHUKUySIFsbQbHxRsYwdvajzFXpNoep9xs4WyLIzwotQh4xz6azjjMbDVpPmZjVF3m1gfxTIF0FWYZIhO2n4xveI7ipizSVgjHY/zBIPEhRWjOT25qlT1CUzWE6+FE/RUYa6FKZCjP9Bu7gW6SF0qMviGKlQNlQPKKQ2DeZuhtIaIjUG0SxK4qrJJH/GKeQl+c8GSVYmcIJyRQAiLTDqHxEkiJMZ3DG5tAdHegwgHKXXF4WLtUaTSvVv73oeG73YWXTLGqX7BeLK+KPByvVrTWvUbz0nhBY5M0EJaJEHUDIxAAKRDRKHh1B6NgwA+kDgVxmxJ4IRM7GcAsO9gRk0KHifD8F/xSq6C8xiZx2gIBuY0OZt6g41su09tM7LjC6aoSPRlCSZA1KKx1iQ4aBGcVqYtVKhmLSpMkOuYbGoUuiVkEq6SQtsKsKKa3S1LnFcKFXK9ESag1eaTOCmb21Ug/EyAy5TG5U9J2wGX6XhNnS5Hot6JYRYVngvAgeaVKsS1AIO8hlCI8UsROhyhnLcLTNp4hsOMmqh4HWo1Lnv3kL+qxSaPzcNxlej/y8Ir9rR2JO+JSdW2717Lac3HsvzTN+//q6cb+au/vneZmeThWxQqHMA2MdAYRDl+3pG9sXId3eQi5oQ/31DmcN+1g7DX1XLgKOr9VJnB+jNyeHqJXS6hDJ5j64F6aP34AuX0z3rEzDXeja2UkjUwTlR39BL5xgtI7thM/4rsUFH74QWKffRq1dzviwDHM/l68RATv6GmGf2Mf3b+7f0Uf539iL7WEoOXP9lN95y4iF2dhahZ3bg6zrweVLyDCYV/xpWb7Ptz11Q+kRMRj4LpQs/3M467H+Ad3UuxU9D5SYfCDLl2fDDTcwUQwiNHZTq0zTbk1QPRzTzP3b/bS/I3R6xS5NBrNbcZzUTWPxckb4Ti+O1GusFIEYlGpatBXuAtIAxkKYvV2EfzK+YagRGtXJ87VEYz1/aiRcdpKpca1XV9+YV2K1v+Bn4U8FY3ilSu+2lU6japWSR7J4lwZAiCB7y5KzQbHIfNX/kqu2ddD/CvTiDUdxJ/J31AK12htIXDcQXW24B0/28igHqv32QCCoRAym0EVS7jLs65rNJpVQ33O5K5z/Oo893al7nY3bspyY+N2M5mvkI0GkXKV/DgvgVVhcKDwJRgLRYzmZlRHFnXqAuV3PED0/Cze7i3IqTzGuj4826Pc7pB51mD+jRVGvTDdxSyRkRITD8ZpOQTS8WM4Rl+fJrJ5D/H//TTi/q2omoORWE+tNY41X6HQGyM4ZzP94w/QdLrE2Lu6yZ7IEsi5lH7gQSJfeBojm8HNxJl4MM7Cv91D96MOo7+8D+FAqdMjOCMRHiSueIid25i+x2LNM/NUdvRjJ0xCUzWsCRNlGnhtaeRCCVGuohZy/kuJkJAvNHy/RSCAquTp/MIQE2/v5uJPGVAyGPhBwZZz3czu62zMVtZSUG53CW7Zh3AhsqWFSL7gK98US1C+uz+rRvOqQalGzIYw/WFVRiIopRDBALgeXqmEkc347pOz875wRHMTjE7AnntRnsKJBzDPjWO2teINjzL/nvtY6JOkLnkEci5m2cV44llmf3ovyctV8t1BwtMOY681aT3kErlawhifQ1UqqK5WKm1RqimD+JUSM1ujBIqK6NUK5nyFWiaMkYnjBUyk48HwJCM/sharoHDCa4lMuoSnbUq725GuIjQdp/jGXpygwLAVnilofmwYL5NA2C6iVPUnd9IRPMvAKNX8umuOH7g+Pkf1gX6MkgP7tUuVRrPaEKyOl9o/f+ISf/ETO+52N14wv/Pl0/zm92655fWOL1TY818e40NvWscvvm3jLa//TrMqDA7l+fKRqlhChILIXAnHcYg8fgq3WMQ0NuDW81pc/KXdbP5Pl/zZQKOHlk8d9eMhdmyl44tDOEDTyRIELNr/7gSiNQvxOO5zpyCdJvemDSQeP487N0e4ns+iZXojyjJoOltFfPso4UwT7swsRirpt3PoBOa2vaz/8EEmPrSP9HkHaSu6PzOKc3Vk6T6AbOsuyKQJHb5IqCWLe/4SoruL/I4Ooo+dgVAIZ2YWPBcZjforG54HrovyVMMtwxm+SuvXBOVsN10fPQBK4QCJwWESy767od/cx5rf2Y/Z040XjyLCYbyZWd+tQxscGs2dw/OVqJTjIEzTT7wnBOSX3FZXJOqTBizO9h/0AxBNlqk+AYl/PLji791obcEFmj55ACOdJvWkv4q69lABEfRXft3pGT+p6PQMASAUjeIVi2QOLslyLypdKfzVFlVvt/VPJpa6F4/j5fMNgQwZjZKoC1s07kcaMHzVzx0yM7vUT9NEBINY6RTe7BzCNHFyOYKlsl7h0GhWKSdGbn8er7uV3fyl8olvXuItm1vpb45dd+6T3x64LQbHZN4X/3ny3JQ2OG4VAvByBT/pXaXSSF7n1R9u7unzjbIbfv6ZRix09i9H8agrRx051XhQq4DEPXXO36mro5htrTjjE0Q///RSLHXd1WGx7GKqGVWzkaHQikR6TX/rq0W1/qn/cu8MDnOjXJmeJZYyBNevdYavEh6+6stF1vsjIxHfyHB9H2/l+S8lynEaD/ryxlY/WBww1vXhXhzAe8P9mE+fwaurUPV+cY7yO3bBVw757T//163RaG4n0kDUX7iFYfgGSMAC0/RXNKWAbBplCOTUPF5zCuF4KFOClOTXxYmMVZEVG2NkGmdNC+bQJHZ/G5wZ8gUoBucR1Roin8cdHcdoa8FtTqKOnAKlMLs6wTRQ5Qr2hk6QYM5XEI6HKJYpbWolOFGi0B8jNlSi1BkmfnAQZ3IaY2M/XjToJ1GdL+KlojixAKrmIjwFB49jbF6Pe+YCxsZ+VMBkbmuSpm8M4U7PIINB3FwOozkLjuPnHmlOYYz6yQqFFQCtKKnRvCr5j599fnUnxd2PLV5Oserwnx85y39+5Cznf+8dd6TNZwZm+ZG/PPD8BV9GrAqDY/l/LWGavnuRUhjNzX4ujOXnd25DHT7J6C/vo+O/+i/jzuUrAJidHVz8+R56f2Plj2R2dYLrYmxchyiUcEZGGy/w5e/fTaHdIHu8jPj2UWrfs5PAVw8jQ6EV2vfG1o0U1icxCy6q6CAGh6m+YxdICD7sv+xf+OM9rP/wwYZBYqSSeH1dCNtFnb0I2zdiTC7gXB3BK5f9lw/lIYx60Klh+FnX60bJzJYA4ZY9ND09iXvhMgCBoVnoaEOYBu75SzjxIKGvP4dR9/uW0SgiErnue9NoNHcIz/WlqZdTuqbMspwaXBMXEa2vvC6uOjA+gQOI8QlcfGnvFbU7Ds7gMAwuKb8sX3mVy9uqE7gyhAKiz/nthA8tray4Zy40yi22c23e38Uyi5+JY8uuryc2XRHvcYPYD41G8+rj6vy1g+HqZ/k76h9+/fwNy+QrNvGQdcvafPLckhS67b4yppJXhcEhTAMZDSPCTbhT0xhNaVRXK56UOPeswcpVMcbn/Bfqi8Oo7ZtpPlpj9Jf2UdhSo+VJi9m3l8l+OUTvbxzA7Oth6Ac7cYPQ8/A87snzGK0tlNamCY8GmHtrD8GcR7glQSlrIGtQbg3ivH8PSkJ44zrU1THciUmMdX2oaIjL70ljJzyiwyapSwbh3fcwvd3CykH+v+0lsCBY/+H9OG/egZwoYqTTvutAsAeZL1F6xwNEz83gpWOYsgs1twDBIF4uh5FOoWzbV7ERAtnawsT3r6XzoSvw8xGm/shEfmYvdgRans4x9vok+R0VNnR1Mv6VCN0no5z/0Br6/iWLvDqDqlR9eVz9jNdoVj1GNuOvDESj/upqOOS7Y1HPv+O4DZdSs70NZ2wcs7uLwvYOAnkb6+wI3uw8YnM/olTFaUkwtzFC89MzeEELTpxbkRPDrE9OmF2dfjzJfA53fh5jw1rccxeRkQgiGGy4PgnTRGaacCen/ImgdBoRjeC2pZEDo3i5Au6erUjbxTg3DC0ZVMgCpRBjM5CKIyo1vKlp7D1bMJ589s5/yRqNZlVwcuT5E/t99dTE85a5k7z5D55sbP/FNy7dsMyfPHaBkfkyzw3Nc+BXb63qqrcK1GRvBavC4MBxcecXkJUqMh73M46bElmoErw6CckYztURqu/cRXC2yuQO34eu4/f3M/vTewnPOASPRzBsFxEM4iWjBPKK7FNFRLGCbG1BpeIEHzmE7F1D+u9OU/qBB5ncGaX52RJmvop37AxmXw84Lk5nE9LqJr8thbQV1aSk+9EyTtSk0OHHWMhTl0l1b6PQYdD8bD2/rzQwHzviq7QkEr6a1Pg8yjIJf+VZ3J1bMCcWUJWqv5IxM4uMRvCKJVS57AeMux6iWqXUJnDfOMrrjlf49vvvY+h7BUYNKu0RohMelaEgU0+tofvzZ1l422b6f+WAH5AaiyKCAZT98vST1GhergjTbLhGClnPPG6ZDYUpYZq+y2SmCbct7Sf3a4rD6JRvBHQ0waGTqN5+jLEg7tQU7unzGNmMP55k0qiAhcxFUfEI0afOIeJx3K5mDCGopcOYQmAcvUDzTIdvPESjlL/nfkKTFYzhSYiEcYeuYmQz2N1ZSp0hEscMZGcrolRpGDRSCOT2zeAqGByBVAIjEcMbGvFzCQGiEsfZ0I2dsIhcmPZXmjNNiFIFUfJdPlU6AZ6H1xRHlEoY3zyG3LYJTty930mj0WheKEopJnLV5y33V98auO7YRK5COGCQeJErH1XH5c+fvLFh83JmVRgcSikQ+HEJlYofSDmwLG9d3T0o+IjvutSylGCcpk/67lOddVlnBaijp8nW3RIaddQ9DBZlIBtZwVlyNWhIytbjLeIn/d1FeckA0LTsmujnn26cu5bFGcrF9gA4ePy6uI/lGXgXZyHd+Rprftt3F/vWvSHgLF0nl64JQiOQ0wVin/G/EHd6BqZnbtIjjUZzO1Ge8uVvAVUfVJTr+kbH4nYggDc1jQwFIGChAiZ0tuAeO4ORjGG/Zjuz68MUupswS+txolBLegRyEjeg6Nw1ytDEBoyhEHY6RXjUJDgHbjBJYF4RGzXJ7d2OUYXI+iY8S+CEBYX2GOW3xAnkoRbvILAA0UmX2GCJ4YfaqKUUqfMwudfFzPWROaGYeL2LqEiC001UOm3MeRNltmCUBckLkBiqIv9/9t47TI7rvNd8z6mqznFCTw7IIEAEEgQIwJSVgxWcJFmSbVmSZft6LUteX69l667DWvajq72y17uyLXslWbLsK1OylSOVGEQRIAEiZwwwGEzOM909navOuX9UT88MCBAEiDAg630ePF1TfarqdDe6qr/6vu/3Kysy3RbpFS0EZpvxzzqYORttCoTSVMImwf4scnYOfD6MxnoY9co9PTxuN0cHZ2/3FJ6V8xNzrLpMg/at4h8eOcd9XUn6pnJXH3wF7v/Ij0hF/ez7P191Tds9dHxpecpyUQ97viyLgEOYJkY07v5hmqh0Frmy01VvMg3KzVGsqTy5FTEG3qCReQNtaRr2SWbXwaq/OsL4O7cw1w4NRxWxnixiaILsAyuRtiZycBD8PnAc0ve1EpiuIMuKfIuf4GgJO2wytsOHmYeWv9mD/qmtWL2j6ESU3OokWkK2zaT5Mwc587EtNO0RzLVLUgdL9L3JYtV/FOj5dYvk0xaxfpvARBE5Lwk5V5WKskyciB+hNPL8ELpcRsaibmYjGETN5Vw34kIBkYgz+po2pu91SB42CE0o0qsMUgdKzK72IRT4sprJzQInpGl7VOGfLOO7OAngennM5WDuNn2gHh4vQoQUIM2lmQ0pEUKgHQdZl3S/3+EQFMvogA9jeg4d8FF5zX3kogbaEIRHbWIX3Ygl32QRHi3jPz3M3LZOgp+zWF+ZQQd8lJqj+AcmwTJRfgsnbIEh8GUNCg2SUp1J4rxNcNIhOlfBOHgGtXUthaYAsqIJjBdwghaxfoX/cAU7YtD2I4Hj0/hnHTb85Yh73qqLYExlKaxpRJYVZrpEvjNMrtlH/GyW1BNFKg0h/KeHqKxqAaWxpgowOIrc0I0YnUDXJ0nvaMUoa0oxA/71Nn9YHh4vcp7uW95qcX/1rZN89j07btvxP/Y9V0zoXbu6rnnbH5wcY3t3EoDxbInpXJlvHhnm13Z11ZRIn43f+8Lhaz7mncCyCDjQGlUoVpWbFEZzikp9GHM8A9NpjFM9iDUriR4ZRb6lHnPCJN4jcAIQHhaofJ7gpCK9RjC9wSB+uMjI29bQ+uVe7JFRbEBu3QBaE5wsk2v2E+vNETs6SXprI4n9I0RTrdR//zzpX9pJaKTkNjyOjGK0bcPKlMk1RxArOjDykmKdoPVje5Cb17PmcwYzm2J0fcVmcpMgfHS4qskv0QODiPkG8jUrMXqnwFE4MzPVeu0yumKDckuqVKWCKhYxIxE2vucETz28kS3vPs5MKUT4pW5NY135Hny9Y8y8pJPgmCCzqULwa0/Djk2u/0bQD4WCOwcPD49bhlbaFYHwGa7HBkClgpbSlaytuo7rujjaMmr9FnbIIt9ousHGcJlci4/ArEOhzsCXU6RX+MnvXEnLk0V00MfkzkYS5wr4JnNM7WzCKigqQUnyeIaxXXH8aUW8zybbZpJrMgiPwuj2MKn4JjLdJsFJhVHSXHx9jNRBm1yzZHybj8hFQTEFLT8p4QQNht/USTkBdaccgn6TbJuFWdKoDj/+rENorIIKWJRTIQLjeezOFLJYoZwMoH1hdNNqREUhVrSiTIk2BMoEK//CaID08PC4eTxy5s7NhP7mvz7NysaF+pe3/OMeeidz3NedZGNr/Jr3d2Yse/VBdwDLI+AwDGRnG6JcQft9FDsSSFuRvieFP+0gnA6Urcj+UZb/tuK7NO+a5a8vvJbR2Rjlkslc5y62/9RpRi50U4xY9PxFjFTdKIOvSlI4uYK2H9v4x/Kk74oxca/APy2ohCNkVkWwMoKJe1qRFZhdtwozL8ingpirdxFIK+ZaDTI7FcYA+H5eEfsqlONw/q930rhxgrGzCYzGAutax5iZTXByXQtdXxYU6k0q4Q5CEwr7gTasvMYs1GOUHKypJkShjLDdEgu0xqhLQKnsauPn8rwieZpff/tPOFzs5PuFDUx/ay3TZ+vwzUoqv92EnoDGteO0BfOc/rv7kfUlGr7TgT+tMApJfDNFePp2f7AeHi8i5supymV06ZKa33x+wXOjWvYo/H7oKWECiy9BCdzm8MDIKMI0Cdv2klLO5OGF5cQihUkNpA5VJbeVIliVzgbo+Lr72Fj9W5gmnQ+5JZyBy7wU6zLr6564/Mv2s6DioqvbLmbe6yO231zSvO7h4XH7uJLaksdSys713bztnVgoxeqddJcvTuWvK+B4obA8Ag6tYGIK6pKgFL59Z3E2raQUlyT2j+AkoxTawlS+3kj2AwE+uu/tOFN+gsMG0TnIt2qePLgWbWmiZ01iFwVjb44ROBiiqdfBnKsgs3mCEwFCI36CE5roYAll+lEmxPo0hUZJYFqRPDJNoSOGsBVGSZFrCqCKBvGN09QH85x8oETXvxuMhiz8f1fH+gtT8y5enQAAIABJREFUTN3XwPmVKxAKrJDGN5PHzDkYebdxO7syTOR8GqRETmXcTESx6F58DQMRCqHGJxHRCJQr6EKBr43dw45kH49NrCHqK1L8cQOxHDgBcC4G0AbMHGpk1oFV3y8wuTVEaLSEb7aELFQQOc/1z8PjtrAouzjfKD6/LEOhWi8HDUnXK+PCADIRB9Ok0tGANTKDDvoRbQ0oy6Bc5yPT6WYmEgfGyGxtQpmC+GO9qM4mnJBJKWEhHBBK4wQkkYeOYXZ34iSjyFKFQkfMLfECKmFJcKyEmS1RbA67fiC2JnhyhNld7W5Jar7E9PZGfDmFLGnssERWNJWQJNpXPbdIgTmdo9QSwwlIAiN5tN9AOBpjYByVySLaW8B0sz0iX3S9QvZ7HeMeHrebuZIX/F+JH55cUMl6cF//s4y8Nn7n8wfp++gbrmvbvskc3Q1X6hq+M1gWAYeu2EtM9oy1qxB7jlC/p6rt3geBQ6B+8X6+uzFB49sCBCcrmD96yi0l2rdwATM2rMU5eZaVX17YvwwEcIpFrHMX6DjVWnP7bXgEEAKzu5PYwBDattGBAL4TxZp0ZOPjC3cFnVCIu+Iae2SUrkdcx14HSJzqIQHk3nI/4S895c5j0euLn4rWmsMV1FRralQdekXFds0PgfLPB3lsJojeVU8+XaD15J5nbreI1E+q7yVAMom69A6rh4fHrUEarreOIYGqGETVDNDJZl1jvNnZhYwHQEYiutqQFQdKZfIbmvDNlF0FqCfOYs2txBpOk9nShC9t4/ilax64/xjm1g1Y43OonguuCtbWDSAETmMc2TuEmssRyjRgDw5hrF0FK+uQPzmMAgI9IURbM05PL5VdW4j/8KxrUKgV8c+fx2hKuZK7G1YhJ9PoYhG1ohUcjRwYRTc34j/WjwgH0dOzOJkMIhzGzuUwGhthfNItl42EUcUSuu/GXbw9PDxe2OTLNiHfrf+Z+oX9y+88dWok4wUcNwojEceZTWPU14FlYjQ2IqJhdDaHCAXAUYQH8vR+ZBdtP7YpNFjM/NluOj/sqjmZXR1gO+S64vhPulrzem7ODWSkrGnd20PDGBvX4Zw447p2HzyHfeEiuTffT/RMGjE2BcUiOuMGCPqntjK5OUhqXwZ14ATq7lUMvWslXV8YxO7rR5gmpVduxf/oMbdeescmjAuj7o+NYACUck25dmzCHJtFz6TRto0qFJChUM1NXQYCqKLbx6LyeZyZGUZ/bzfGK6cwvlZPg1yPGBlHhELYA4NM//ouUt/vp/e9XXT+xZ5aoGU0NoJWqPydZ67j4fGCQDkgBapYXrJOlZTbr7ao1AlcHw6dL1BJRbCOX6SwbSWhc9NUUlF833ua8d/cRWTIZuxNzTQdKKClINNlonxxJt68G+XXNBzRRMN+Sk1BylEDseZuSjGJb+VajLKmmJCkHjHQwFyrSfj12/HNltH7TqB6ein9zHaMsqL402sIX8ig/RbZV60hPFJiZs0q6k7mQSQYfFUC5YPOb88w9ua1VMKC6EAcs6ApR1sJj5aZbfcTGSpTCkiCg1mkrdyelVwRU2n3pknvrf1IPDw87jzOj+fY1P7iKkH60oHB2z2Fm8ayCDiEZSLqkpiRCPmNLQhHEwCGX15Pw7ECuRY/uRZJOQqbX3KWT73zG3w6vYnvj23g/L9vxZn1sW79EKNf7ebud5zkwI7dOH6NmRdoA1r2lvFNFJj6+bWUkgJZBvmy3SAg/dYN7iQU+O+uIzSSJPVUA/0/k8SaAzOvmd2gKL5ScG9rgsniNPl+H6f+ooFoPEzFNijMCFZ/MMXG4EmO+jfQIgVznSHKYUHsYpmZ17cTHlXQHiLSG0WbEnNoEm07GMEAwu9HF4uYVQNAubabzk9e4C5jH/+18VEeWrOW/ZkV/OjpuzFyEmtlgkq5QMe7y2y3jvOTVfei8yb1+3cRmnSwsg6+yUY4cjs/VQ+PFxlCuJkN00SV3XJKYfnQdsXt02poQM3MLOljkKGQK2cNyMcO4QD+J0qIuiS+8zl0Mkn9p/Zi3LUGsxDFeOIYwjRpPhpi7oHVrPzcAHObWgg+dBC1YyO+mTKhpwYR0Qj2hYtUXrUN32PHiDWnsAcGMZJJUt8vgm2DZaFME7miE/93XclxfzIJhkR3NBP76iGEz0f9Y1lyb7mf+IFZ6k+GCT52Emfzahr/eT9iw2qcaABzJo+vMUwlbJL8wgFkIo6aTSPrXaUWEfBDsQQB/1KpcA8PjxcNU3PXVnmhuT3iN7dTc+f/+M8X7g+3qwYcQojPAG8ExrXWd1fX1QFfBLqBPuCXtNYzwtX7+v+A1wN54N1a66vbymrAcdCZLKHTEkplVC5P63dK2A1Rkk9MEFzbhLIkPdm13Lv2f8eMl0l+P4i/RVCJadSfNFB8LYx9aAX6FVB3UuNPOwS+tR+0RtbXkYj7MR49iHpgK74L46R3dqCFQXBKISsas2ATHHUzA+1/ewB173qsC6M0PhHl4ptTnP7uXaTXQecjNsV6g1IiQLCiSfXbyO8OMHnf3URWK8yBSWL5GOp4D3LTWlq+NoTdlXJN/2YzCEOi8gWEz0I7ypWwNQx0Lu+a/x0/S98O90fJz37gg9zzy8cYe1cTLZshMFVmZm2U0nrNkdFVNB6EcLMkOqCIf+f4QsNqKPQsb7iHxwuDW3J+uvokEIYBQoIUYFkYAT+YJsI03UxnsYQOB6EzRSkVQvkEQlHrqM50mtghKN6TR40HUCEHc9qEzgJ6eB1mZw6n14/zns3osgQlkMEK7aqFmbUmwZF1TG9wv/OVHetIHSgw9bpWmvammXznNkpJgRadmEVwfNC0L48TMsm8toPMarCjjciCQAU02q8QBYmo24AQYJgOulcy/PJm2r+v6f/AFoLjGmfbDuwgRAYVVqOPuWYDf1oT2LoOZSukFJTXtGKH3L4OaWvyzT5m39kOH/7S837bPTw87iz+7Osnrmn8yeEMm9sTN2k2Hrea55Lh+Bfg71mqnP7HwI+01h8VQvxx9e8/An4GWFP9dz/wj9XHZ2e+7AigaphnNNS7zrW9bh+HOeCmmZofgubqZnrXFpL/shANdu2pPj4G5ddtJzCcdS3hhcCZmsZ41O2VsM4MoeMRwl9+img4XCtrApDzpV2xGHrvEbeHZHSMju/6GXlZnJUfdA/iB+Ql2+qnj5MYSGGPjUO1T0QdOYU2TRgZxZYGMhhA2zZCCNRczl32+6FcQVhmzTBwnqaP72H44wBZImfOAdD48EJfiYxGiWezCMuHqpQxmlI44xNeSZXHi4V/4Wafn66GkGilEdXGLZXLI6t39JVTlcKtlN2m8UQc/6FZzM421Oi4K4Pd3kbkhIWennFLQAGkgREJU7lnFUYmC8fOoG0boymFCAaotLm6VUY2Q/jxEXR3G8nTBeyIRXAgg5hOkwi2IzN56o+6vSRyYJzilk6sTBlRcbCm8wSOzJKKhrGb4ognDiO3bkDMFdBjk+RfdhfKEkSPTeD0HEXv3oJ5ZoDQYAvpdVEiIzahPWcRdUl0wEf06Zwryz2bRXU3IwJ+zJkC5sEBZFMjTjJMtGgTHpSced5vuoeHx51Gyb42Sew//sox3r6j8ybN5sosR1OB5Tina+WqAYfW+sdCiO5LVv8c8LLq8ueAR3Ev6D8H/KvWWgNPCiESQogWrfXIsx5DKWRsobHabGleeLLaKD3fxL2YYspP8JJ92a/YhvnwAXwP7a85iMtIBJXNYnZ3Yvf140xMYKTcC7Z931qErRFPuFqTalU7xkWzVuZQm+OhE1j37qr1fwDMveZuQl+tNonHYtDejJjJILZvqimxGOtWoweG3TIK5SAb6rAvDqClgRGLoB0FSiEMAyeTwUgm3YbSak5vvrfjSoy+axOpv9+D2LgaWbZxTnpSdx4vHm7F+emq1NzFnYVVlwn4tW3XziuLy4ouPa/N79PJZJCPHVpyoXHGxgEQ1e1rl+/DaSTgA+ZnYVU9iOb7JRzA+r67vV40jjEQ56qHPXyydqzAN/fVtgMQe464y1PTxA4s7HNe7GMJExMs1sBRvYtu6jxztMcyQQjxOtwsoAF8Wmv90UuefzfwMWD+P+3fa60/XX3uXcCfVNf/ldb6c7dk0h4eNxh9G2qquv/42/zha9fd8uPeSq63h6Np0UV6FGiqLrcBA4vGDVbXPeOCLoT4LeC3AAIijBACs7kJp60Bnc6T2ZLCLHQQmChiRyxKIYOx93bR+dP9jGSj+L6VIPq2YXpesZO2xzSDr9IYBYmsCOJdu5jeorHSEiegWf1gGjOTx4mHGfjV3Vg5aH14BnHf3eRa/JSjgsJP70Y60PpoFmdlK0YyzvjLmtAC5jrAKArEvWn6Wuop/s79ELbRFYV8yU5UzAZH0NQxQ+aJlXR9fRq1awsT94SJ9duwtg5lCYJjJVS2iLmiCzXuuoKrXB6jsR5KJTerUi5j1CU59ZFVJFsypNMhfm3zU3ztky8jOmQzebdJ50MZzn7Ahy4ZJFLT6ANbkP0TZLe3Y7bfhzVbwkgX4PR1froeHnc2z+v8tOTcxHMoTbzUOfYKFysZDqPy1bJJT0XOY5khhDCAfwBejfvd2C+E+IbW+uQlQ7+otf7dS7atA/4cuA83nj1Q3XZ521l7eFyGQwOzt+W48+7mL1Sed9O41loLIa45HNRafxL4JEBM1Gknk3HLqUbHcIDwuQvuONxbLQbQ+W1325b5nXwa1nARgLVfWbr/ukXLioU7gR2HFtYBRK9gjucA9T3urcGGa3hdcc6hcI2uUnuf+fziudSOVb1rWSOXY+1/ccu/GoE9+EjhlnJ1fM19T9b82tJNbCBYLeOan7+Hx4ud6zk/XXpuetbBQoCQCLkQdGjHQfr9YFkInwUV128HKTBTDW7ztN9HuaMe38AUamIKEQmjm+sRQ+OUNndjPnyA6ffsomH/jKucl4gyvSNFcMphep1F/KKNcCAwUWT0/gjhMUWuSVKJgFEB36wmMuwgtKYUM0DA5GZBx8MVAr1TTLykGX9WEbmYxw5blBMmlaDEDgqC0w6hoQKiVGFmc4K6A1Nk76ojMFUm2+4nOOVgFmxkwWZ8RxQtQQtAQnTAQdoaK+tQrDMxyppyVFIJuQIejYdylOM+eMjr4ViG7ADOaa17AYQQX8DNCl4acFyO1wI/0FpPV7f9AfA64MGbNFeP58GpkczVB72Ima16qN0M/p8fnOUTj5zj3Edef9OOsVy53oBjbL4UQQjRAsz/Yh4COhaNa2ch9XpFhJSYTU21v8vrWjFyFZywhZku4QQtVMCgHDNJd5tUXpKhMB0kcdjCqGjyTYJiysHKSNofLtP7KwKZMYlclKQOFDAKFcoJP0hBtt2i7jN7US+9h2Kdj1yzxD+rSZyYRZ+5wPh77iXeZ6MllGMGkYsFet7tI1hfoDgSJjBqYOXADkHdaYfRnZLwoEBLCE4qxrfDus+kKTWG8I/nyKyLow1B4tgMIpNzTf/iERidACHQ+YLbdGqZOOkMMhRC+HxU7u7m3K9YNHVOk84F0acixHtAG4CG6U2aurumSL6hh56P348OKlZ93sE3nIZ01m1En7vOT9fD487mhp6fnhWtQTvoxXcRhHB9cBaXQs576FQ9dwBkXz+OabomgLaNOnIKIxbDf6gXB6j77F7EutVQKeP09FI3m4VKmdbRNsp1QcyHD2BsXEfrZy6gslni0SgiFIREDOfMOcyV3dgXLhKoZlTi/2GiHQdHa+rO92EkEohwCD04RMjyIQL+Wlmrrv6rz3ZhX7hIuMfEaEoRf2wY4fcjfD5kPEbLQ3NkN6eInphCByzk7FytHy+C22MWMSSkGnDOnsdsb4Mnn99b7nHTuFwG8HI9Tm8WQvw0cBb4fa31wBW2bbvcQRZnEDs7b319vgf86NTY1Qd53BQ+/qMeAN71mX28cXMLr9/UQth/9Z/ix4fSvH5Ty1XHLWeuN+D4BvAu4KPVx68vWv+71Tsj9wPp51IfrZXCHl34AsjRMTRura+SBkI5GECw+k9+emmzdj1grF6BUxeBfcdYXd6K/ImbuhB+16DPqo6dz3zIxw4RAmItzdgjo7WsQ+M/uWkJs70Nf7W2em01U6F3bUHsPULlNfcxs9pH9OHThL6ytH45/j/dDIaF+xg56q6fzzgYsRjO4prtS8z8VD4PuRzy8RnuOlmHvb6T+BOHQRqYbS2ueVciQfJzC5nqNR9Y6CO5tOncw+NFyA09Pz1nqoZ/2ql+27X7KExzYd0laNt2+7uqPR+Xfn+dqlAEgDMx4S4cTtdO3PP9ZIAbLGSzUM2Y2r197jGq5VuL5XjRGmdmpmY+qCvlmunoYuwLF2vbzhum6lIJXSrVgpPQuQu189ul2dv5MbLozuGy/SoedxLfBB7UWpeEEP8Ft0fqFdeyg8UZxPvuu++F0At7xyEuLQO9RUznrr2UVCmNlLdnvjeTx85O8NjZCZ66MM3H3rL5quM/8eh5Pvi69bdgZjePq/bvCSEeBPYC64QQg0KI9+JeyF8thOgBXlX9G+A7uC2K54BPAb9z3TOb/0KoZ16oFwcb8zjnLtQcx+VPDtfWX61W2h4ZXTik5VtYf5kLo9jrKmJZ33+a1Cf2LCjKXAPPCAgurfde9LczNV1rZkc52AODCz8UqpjdC3eIvGDD48XGbTs/XQ7luD/albPkvKVt+/YKu98EzOamqw9axLMJX3gsG66aAdRaT2mt5y+qnwa2PddtPTwO9l97b8T/+8MXthDORLbEPz324nBCfS4qVe+4wlOvvMxYDbzvWidRK6kSAh0NI3IFdLkCjUnsZAhjrkSxOYwTlEz/Wo7ceJjUHoOpzRprTmIUwfFD6+MlfJM5hl9ehx0CLaHtxwWs432ouRyyu51yW4LZ1X5Ckw6hiznGd8ZQliB+wUaWFYUGk9CEjfX9p5Gb1yOGxsm8bA0IGH45JI5Jmn84inPuAmMf2I0vrZnY5RDqN2n/yB6G/3A3nf85iJ2KY8zmKXUkkLbCGskgCiV0Ngt+PxSqF2Cf5ZZZ5fJorV3dfik596d38+qXH+LoVCvjB5uI9UJ4zCE4kiezOkKuWZLZWEGUJWve18+Fj+yi7bEKwf40VGxEsbw0we3h8QLkVpyfbhTz2VaoGgI6DrVaLK0xkkkwTUjVuY7kPX2IjatBKZASJ2Qhyw7KbzL00yG6/nOEzJYUseNTzGxrIPmdU9BYT7k9gW9wFicZdsUjDPe+ksgXcYZGkd3t5NfW458oIhyFzJdxogFkvoLMF1GhALlVMXwZG994DjsWwLowSu6eTpRfoExB7NQsor0RLQXZlWEi/QWE0q7fRqZQO54uVs95UiJjUXSxeF03ajxuCfuBNUKIFbjBwtuBX1484BJVt58FTlWXvwd8RAiRrP79GuBDN3/KHi90HuuZ5HdfsYb3fm4/f/S69dzd9sJyHu+dnGOuZF994AuAZeE0vqSkanGBw8QEAjdN76vabbR9beHpy9nBKKD56NJ18/canZ5ejB6of7R6XKDx0NKxvkXL6qgr8xT+sluytOZLS/fX9HG3kTu5SPyv9WN7XDnIPneceXbpNldj/gfJyg/u5TwQppcVLES/GrfRPcqCHwnAiv+295qO4+HhcQORBijHzZJKga7YbnlVpYzw+5HRCM70LAiB2dbq9mzk8shIGB0JYTdEqZgS68h5KFdwqoIV+vBJ7Fdswz84izh8Eg2YTSnanxiviWs4QOzMuZpErdGzcB7QgQBIWZPpNbs6sHt6CRdKqJlZRCiEak8hyw5iZBy72mMS0esotkTQ5y9ihkMQi9bcyKF6bmtvw5mYJLrPPWcZa1bi9PSiw2HXyFVKdMVG2xW3T8Xvg3gUWbG9/rJliNbaFkL8Lm7wYACf0VqfEEJ8GHhaa/0N4ANCiJ/F1SmZBt5d3XZaCPGXuEELwIfnG8g9lh+3qaLq+tCaM6NZHu+ZZCZ/lG+9/yW3e0Y3lIHpAgPThds9jVvCsgg4bgTLvX/BbG8D23YDq+qPk8tySU+Hh4fHHcC8F8eiPoh5Xw5dKuEsKu1cXK453+MgelxlOwdq5qfzmA8fWHIj4Rmqds82rUtKmeYbumtzyOVgYuIZplLOiTNYJ6o9GcXikmb3y70OoBYkXa7kVZdKC+auHssWrfV3cEsPF6/7s0XLH+IKmQut9WeAz9zUCXoAMJIu0BwLXHcvhuDOiTi8X0MvHJZFwCH8foyulQjbwUlEqCQDVGIGlaCkWCeJjDhoCVN3GwTumaYunCfpz5MuB1FaoLTg4nA9pC0e2H6KgyMddNdNc/JEJ9FzBvXHShglxej9QYoNmvrjGiuvyTVJZjcqdNDBGrNQfk1wRBIa08R7C0xvCDK7TqMlbNp2gbMTjRTzPoKnA+RXlsGW+JJFNz7Qgki4SOXxelofn6PQHKASlEhboyxBrkXS+mgaupuQc2VkNgdSuipV0TCk5xCmgc7lobmRl375KI9PrSbhK/DG+iP8yTfejlGEcqODqAhCbXM0RHKErDI9IykM0yH1+SC+tI1wNNZ41vPh8PC4Vcxf+KtlkUsatFlUQlUNQmQo5G7jOC+K/gbh96MrtltC5t1Q8fC4bg71z/ALn9jD/3jzZn5pe8fVN7gMzxanTOfK1IV9Vx5wizk6mOajD526+kCPZc+yCDh0qVS7OwbupExcRarYonHhRdLtWdyO9/mu93k/jjGgjRNUgDWX+A22Pr70uCFcn4srUf+Eq4AFUAA6eO5ScvOKWvPEWIjUn+HFceluMxke3hQGRpgAPksXq7iMqQfuHdGVDF92vYeHxy1Ca/cqLg0Q1Z6JqnwsSiECfnSxhEzE3ZsMuOc9EfBjJuJgmq4CXVMKEQxAsYSORXDOnr8l0zfbWmsqVDcDz+jQw+P5c2I4za//i1u1tq9v+roDjqm5K38f3/f5gzz4Wzuva783iyfOTQFwfGj5VrF4XJ1lEXDMY9TX4UxNYzTUuxfllZ3ImQw4DnZ3E3MdQYKTFeZafdR98ySTv7iRhq+fhkZX7LbUnsB8+AADf7qb8JDGn1FIWxN98iIqk0W0NpG+twnHB2ZRE/6S25sx9d5dRIds18zqsUMYG9a69ccBk2JjEP939zP7zl3U75tg+r4GrLwmvcJAKGh9dBYxNIHwWdhDw8gtdyHHpiHgp9xVDwrMmQJ2Moix9xjC50P4rGdtnDTq6xChEJWOegZ+X9H51mNk3rETf9oh9NR5RCxKqase49GDjL1/N23fGsK+cBGxbSNyroSYy7vvn1dB6+Fx65gPNCwTpEAIgTCk63VRqbgZzWIREYvgpBKIioO2FZX6ILJgozsauPjyMK0/zjOzLohV0ARW1iGUxjdVQNgKMTSOnsuhikVKP7Od8PERVH0MfeIcSAGb1qCfPg64QYQuV3BWNJNdGSZ+fNY1EayLIzJzEAygM1lIxil2Jinu7kCZgvjpLJw8BxtWow+dQGzfhBZgXBxDpzPY963HeOokMhh4RhmrDARQ5cozS0arpaLmym6KK+rhh57xn4fHtaCU5g0f/8kN2denHr9wxefGszcn43p86MaIRTzdN8193XVXH3idaC8De9MQy+HNjYk6fb94hqiMx/Pkh/pLB7TW993ueXh43Kl456abg3du8gDXh+Ppp5++3dO4I/jY907zD48sZDzfsq2dv37rluvaV/cff/uKz61sDPPwH7zsuvb7bHzoK8d4cF//895PxG9y/C9eewNmdHn2nJvklz/91E3b//Oh76NvuN1TeE4IIS57fl8WGQ5hSIQ0kZEwmCZ6LodY2QkTMzgTExhrV7kutSu7mdnRTL5R0vrgGXo+uJbubxbxDc0w+KZWWh+ZZnJbkrrP7iX/i/cT+krVEK8phTM2jnpgK+ZsETE4Aq1NVOpCWKNp7KY48qnjGG0tqHgYdfS0W3NtV5h+905Ckw4DrxYExg0SPYroF58k/as73f6SIYdoT5qx3UmEhob/fy9GQz0kYiAldn0E68IolRXNiL1HMGIxtG0jQkF0qYyam8OIRlGFIsKQtdrv/j/agZWDhqMlzIcPILduQAVMSkk/waEsxdYowV43hTF1f4rQeAX/eAFjJguOQhcKMHHbPlIPDw8PDw+P54lSmvc/eIhvH7txHqW3Cq01//n0IG/Y3MLks5RxXQs3W0I2Xajc1P0/H7TWt8208UawLAIOELVgQ0TCVDZ2Ys4WKdzXTWA8hWMZjPzX3RglaPnhONN/GaS3YR3+1WmmNsaYfVsT0R44894ESEWhaTdaQP63duHLaur2DmNsXIcoVBh9aR3xvihWukKp3sfozhaUBeb2HfhnNcEpBzp3EBzNY05lcQKQ6TJBK8Q9aWZknPADW0mcyXH2V8Pk2iWza+ooNihUUFP+w920f3caUbHRfpNCSwAzHcfIlVA7N8PJPmRzCj05jRACIxoFQIaDtRpvEjH8M1Bogt53gvnaXUT6BUZZg4KJrXUUGxWRgSayW0rUPQ65FgsrU8FwFNg2eu6ZSjEeHh4eHh4edw5zZfuWBhs38ufs3t4pPvjlo/ztD88ykr5xpVp3+g/v6+XLB4d4y7b22z2N62ZZBBzacRZ6GiankH39iEQc/5F0rdG65Un30QG6fmnp9vON381cnsXxcOrAwvKljd1L5lTdrvEf3Wb01OLndm9B7z/Gmv2X23JpQ3jo6NIG7svJXi6RyS0WYXKK1Lkr11gu5tLX/OKwj/HwWGYI4fZwaIXw+dyGcL8fHAcZj6GLJbd/K+BH2w4iHMS+OIAMhdybDICamkbW12EPDWO2t2EPDmG2NKNtx830rluNc+Yc4PZ5VTZ0Yc4UEMUSubsa8c1WkCWbcl2A0KlRyt0N+C5OoU0D59wFjNUryK1vJNwzTSUVxdc3wfQD7SROZRC5Ik59BCdoYjxyEOdl92LOlZFn+xHJOGp6FhEIoPN5RMCPMzWNjEYvOxr4AAAgAElEQVTRhQJGUwqkBEOic4WaEpXqbEEWy6hIADtiISoKo+S4RoZPez0cHh7LkRtZZP8/HjoDcEODDYDZfIXkMlLSulUcGZj1Ao7ni5ASI+EalIpQkPLqJkSmjFStIARydg7t9zH4xhTZNTZm2sDMCQKTIG1NYEYxu9rAysHc/QXW/l4/A+9ZR9ujWWQ6jw5YCEcz/Ip6mv5uD0N/tBuA5n1FBl/uJzQCvqzGl1XkGw1SXz3D5BvXkfy3fRh3rUb3DTLw/i2UExr/tKDupE1w20ZGXhIn1u9QDksm79GQKhF5Okiyp0JwKIfIl5h4IEWsr0y+ySJ5YAIVDiC0Rk6msQeH3CbLYrEWdBj1dYhgkLHXdTK13Wb16lEGphJ0/3fFxPYYyido/vE02TVxwl9+ivN/s5P2hx3Gt1p0/CCLMZtHlMpQKsOdl4H18LgzERIhBWAgfD5kMOAGIIZEmCYiGHSDEtNAh4NoQ6B3baYUsZhZ6yM6aJNv7EJLCE104FiCubYuZAWyqxRmbjVGXhDZLZhJh7GLJmLOwMgHMMqAFkjbxPFr7LAmcE8HpTqNf7odswBmoRmzAIEZhwtvSyErkEy1kU9J5toTaAlCuf9Kr9pFuc6h7UcWztoN5Fok5Xg7pUYHa9bADisSpyTKEpSjYEfdnyjBEYHygXAgOKmRtsYOCPwZRTEhkRWwgxAeU+CV7Xt4PCfe9/mDl13/pQOD/PVbtzA0W6AtcaVbp7eXwwOzt3sKHsuIZRFwaKVwZmbcP2ZmkEPDaJbKyAK0nDlH7C33E/7SMyViI9XHJmD4/btpeSKH3n9sSXah6YT72PZ/78HsaMceGKTrEZB3r0cdd00rgoBoayX5b/tAOTgnzmB2tNP5qdM4i8yvxNYNtD48jTp6mhCQ+LdFB5IGSjmY3Z2kvnsBe2SUGG52Q5gmyrZrr6mmwV/NcMwfo/7TQ9R/2n2qiwGMtlYaPnmi9n6Eq27qq/7ATf10VHvAPDlcD49bjBAIKdBKu0FHpYKTq2BEwgDocgERDtX8J1TEhzmWdrMKwTrivRWsnI1wIJcymGs2CE4pfBlNthtWfrlEpjtA8lSGmf4G6pUmNGFjZouYmSK6fxi9uhOZK1JujWPkK5jD0+hwEJEvov0WhVX1SFsjSw7tj1TQpmByUwArr0kcLTGxOUBwSlEJCRqOlBnb4acc1pglTccXLzL+mk5W/WM/Tnsjjt9wsx+ZPDqTRa1oxRiZRk1OLfUUEQLh82E0p4g7ClUXRYxNu75DHh4eV+UHJ8d4vGfyis//pGeSX/3np/i7d9zDm7a03sKZvXBZzpVa+g63QVwWAce1MC9l+2w0/d2eq46xBwZry/PBRu25S/ToF4+tbXP45JV3Xg0e7L5+jPql8m2XGoI9V26mRr6Hh8fzQOva91qrhe+4M5cD5SD8fiSgy2W0bWPM5VCFAlgWAaVx6qOU6gPIsiIyogn2Zxl+dR2hUUVgUjK2I0h2fQXHHyXbBZXOMtGDAZTfRykRopKIExg1kRW3NMsO+4leiGAVNJWwe/XMdkK0D9JrId4D2W6oO6nINUtmV/nxz2qynRI0GCUf2oBCo6DYACMvb8WY08yuW4ET0Jh5Qex8kJkNMQITrVg5sLIRlNWJ43eP58tofDkFGnxZh0DfDMJW6KY6xE1u+vTweKHwm//67KnAUyNuefb7Hzx0wwKOfGn537bc1zfNazdeqYj+hcue81O3ewrPi2URcCxp/lnczyAN0AoZiaCy2Wdsl3nHTmIPPrlkndnVgX1xYGHflg+kcGuqt21EHzhRe07v3oLYcwQZCqHyeYxkciHT8iwYiTj2xhUMvixE91cmcE71AJB9+06Sj17AHht367XXr8Q5cgqjsRFdLLqKVPHYs3pwzOvVPxvC73cNxJrqEANjqFVtNe19GQpd1efDw8PjJiENoOrFAeiSg67YKKXRlTIAai7nLheL7nmtFxZXIyug2f0614xPW6qP9Vwf9Vd4XGysmrzC8tX2+VxQlg9hmah8/hq28vDwuNWMZm5Mv8XNtFz4/S8e5uSHX3dT9l12lm8WoXfizhYDklcfcguQEiNZvcQpB3NFF2Z7m9vTEI9dNtiY+o1dZDslYvsmZCjkStGC24gZCNTG6Uq55nIrHI3Z0ozZ5bpzij1H3MPHXKWo4vZVALWshBGLYXZ3PuPY42/dgHjiMN2fPucGG9Kg9IbtmEXlGgyaFrpUQh05BYAzMYHobK26ES9yIbZ8bvAQXigxEKZVW578rV3POLYwTYy2Flfm9+hp+t53Vy3YmMcLNjw8bhPKAeWgS6UFd23l1IINYMnyiwVdKXvBhofHDeZG+FrcLD77RN/tnsJ18e9PXbzdU3jBsiwyHAgQ4RCm34fd3YTjaLAV2Qc6iAwUkcUKua4I0+sMCq0OzU8Ipjcrgm1zTE7HyL9mK8oHVg7ymwqEjgSpRME/A/4ZTeOP+ql0NKAcRXFLB8U6A/1AG+k1EivjjglNdOFLV8j/wv1Ez8ww86Z1yEq1GfJTgvKnWhh5Q5nI0QC5NkXuPzcR8FXwfWkV4w/Y4FMELvjJpbbQ9MgYcxsbADAKCv90CaU1zsvuxRiaxQBEwA+GgZ6bQ8RjCJ/PrccwTUjG6X9zMyt/ppdzjbsRCvKdNrHTJsVGXW3wbEEbEByHs/+0g5ZHJdHeHJQdzHQOPZeH8dv4mXp4eHh4eHjcNHonF+5494xlWdMUvY2zWeDIwCwf/tazlJ0vY57snb76II/rYlkEHNp2sAeHAJC5PMKQOLNpoofc542OdkIHThACjMZG12dCrCP2oFseZbxzF4l/W2gkH/293XT9+UIfhw2I6v59QGDLXagjp4g+sBX5k8NL5hLCbbxOLvqu6Atb8O15kjX/cfn5L2kYr24f7Old2L76aLCoqXtx6dh8RqJaTiXncrR9tJfSR6GDUYyGekQweNleElgqjatYKsvr4eHh4eHhcWfxf33jxNUHLeK3/+cBfnQTHMKvh28e8XpOPZ7JsiipElLWyopUNotIJpY8r4N+xPZNAJz7eCs0NxAeKdfKqOqOzCC2baTyqm3IcJjAtGbqN3bVngcw21op/PwOAOTsHHLrBqxT/Yh7NiK2bcTYsBZj7SrM5iaKb9yBME3MDlfvWOw5Qv+f7cZc2c34+3aTfdvO2n4Xl0MB7r4aGzHbljZwFX7OPbbZ3FQNLBRGLIaMRpGBAMJ0Yz8ZCKBtG7l1A/lfuB9j9QpKW7rdYEMaGPV1qJfeg3pgq6t/D7BjE+lf2YkRi+Hh4fHCwVi94pnrFglRGMlk1QNEXFZe5dLz03VR7UuZP78YsVhtDsLvX1jflLrith4eHtfGv+zpu6bx5+/w+v5r4Sa2h3jcRJZHhkMpVG7hy2JfWFpD55w9X1te8Y4jOIBxaiFboI66KlMW7t39+WzHYq0Fe2iYYFXpyb44APOHqMrQLh4b+NaYa/y3KKPQ+eE92EDqH/qWzG3xvAH0gRPoQACnWESGw7Xng1/f5x57dAxwL9TOIgNAYflA2zVZSX34JKHD7rzMeRNA5biGW49dMud9x4jv8yRxPTxuK1XzP+lz+7C0o9B2BRkMostlhGkiGxvAtsHvQxeK4Dg4k67yiLFmpXsllZJiZ4JCyiK5fxxhmmjbRmzbSKE1THA4h+xoBltB/zBojbmiC7spjnlxHAJ+tM8CKRDZPDIRx2mtxxiecm/eKI3d24eRiENLCobGcNZ3gRTIfBk5M8f0S9qJn51DXhyjfHcHxqOHEKEgZDLochkZj7nnTscBn+UGG4kYRqGIk8m4QhzFEjLgR+XzmC3N2COjGLGYq97lnaw8PJYtL1Yn7zsB21GYxrLIFVwzyyLgEKaBEY6BZaLLFVQ2i3HXGpxTPcgtdyHHplHZOYoP3EXo7AQX39ZG+/fSnH13lKanYGqTYMWH9jLwJ7tpPGzT/2aH+id8yArUH0lDjxtdOFtWY07OMfCzTXR88ji6s5XZzQn8sw5W1mau3U/swScxm5sorW/DePQglVdtI7PCRzkmKDRp6o655lXFeoEvC8EJxewayYrP9TP0C520PtgDyRhyZBxWtmPYCjGbRVXLpsS8IVipBMEAumK76i0FN0DRJdeRWN29iumNEQopQeSl4yT+PEhmZRhlucZahUZJy49n6XlnjNQ+sAOC5JkcxrkhV6Yzl4fC7fxUPTxeBFQvysIwwKjezZcS4bNcUQhDgt8HjkI1xHFMiRMwsSbmUA0xnKCFNrvINfsITNv4p0sY47MUGi0CUzZOMszgW3fQ8b00KPBPV5WuDp90hSe6OzA6WqjEAyi/gRENu2aDs1l0IgqOQ+7eDsy8Q+nuVoIXZshsbiDQlmSuxU8+JQlO1pN8agSUotxRj29ilronR1Gj42Rft4nomVkcrSnc3U6gVIbWFIWWCJXtbYS/cYDcrlVUQpLkU8PYcznM7k6c+ihG/zgiGoZYCF0o46y/F5EpUegIw1e/eJs+MA+P5c/TfTevj2ByrnTVMf/8kwv8xktW3rQ5eFw/ttKYd2jieFkEHAiBSLoa8joRQVgGpZgfZ8V2SnGDSCJAJWQy+AoT5xdT4CsxqOOQKDJ+nw+hNcMf3E2h3WYCE3+4wOzLFWrKh28uSlR1oC0DWbAZfWUKa07T/7/dTb5VgaGxZkyCExZGUTP4od340hDrt8m9YyeVsKD+7QOcP9jBy3/6KKf33s3EdkCAM21Q9/PDFHIhLtCJlYULv7OGFV8YR6/uJN8VphKShIfD+Pt95O5qJDiQRQyMQTCAkBKMMkgD6fe7b0XAj87l6f77c5Ctpz08y1gxysifScqPRTHKkG/VpLaO0v/SIPQKJraB3VgmcVagmxsRjoMY1V7A4eFxs6nm9rXSCKHRjiuDKw0Dyjm0UmjH7aoSk1PIcAhRLKGlRBoSIxpFTc9QF41AwI/d148NxPoH3X6ucJiudAuF7gShEyPowaFaT5gulVC9/ehKGQFYW+5ClCvYfVXlmpFRAPzfHkU9sJXQ2VEwDUJf3Yexqpv42RyRMVdZYt4ZQ14cwMbNtoj2FmL7h2qZXutHB1GmhYoF8O89jZXLYbS3ET41gXPuAioUcrOwQ6MwOIz2+9G5HIwJaG/BnKsgzvYTyTTe/M/Fw+MO5quHhq5ru2LFIWA9+6/R7x4fvep+vrh/YFkHHHe6Ad7z4U4uJ1sWAYe2HeyLA65fxdAIMhHH6mrGyggCIxpZtvFNzdKhOxl4pUnqgCbXAsaYn1V/ehBdKmGsXkFhdQNj26D5swEmN1t0/P0RZFMjangUYVmgFPWxtfh6J8CQqFiIyW1JUo8O49RH0U8fd2uiDUllfQeRwxPY7fWMfrOT1U9kOfPERmbXGNz1t0NMPtBK8nga/ef9hN9yP6W7oByHzr/YA00pJDGihzLu69qwFru3D39vX62h22xvQ03PgGEg/AI1O4f0+3HmchiRMD/Ydxev3H6cI5/YTCUiSH1iL2Zzxr1rKQTOf7tIWDluicL6LsyxWbAd13SsVIJqAOPh4XELUA5aOW75k+OA49QeAWQ8hoiEQUpUZwtOzIcdMJAVjbKakbbGCUhGf70NoyDId9jIkps2V0HFxvUDBIwKB3vv5Q+2/4C/efrVRA8FyGwsY6RNrKwk0q8Jj0XIvraNckxQiYB/GvyzmnyzgAe6CY5rJu9rIXrepBzXCHs1RgnMPBQbQEtNYEqQXemgQw7GrIlT18zqrjH69rejAhoVUASHtuD4NMoHaIifb8Yogj/rMNdikDxbZq7Nh29OERgvoS2JsDVWZyuMeyowHh7PxjcOX1/T9fo/fYgL//31z7scarn/pi1WPGmcO5FlEXC4/hTgTE2DVm5N8+SCo6JTVXQKHlCsemiC/C/eT/Pf7qH0+u01rXvn3AVCpQrRVAfho0P4vzvsKjb1Vnssqr0R8rFD6KrBnwyHqTt6GhtQbVuRUDP+s0Zj2INDiIlJmp8soYFIbx3Br01jA9GuekqpID5c9/MwMPHbuzA2rMU5fR7Gxmt+IM7Js4BrGDjvkWEPjy6oVGUBaaAcV8PfyWRY87tP0Q8k2UvmHTvJvON+kgcmayaD8xR2rSV4YWaJ2aGMRlFjniauh8etZt5l/FLPCWdqutYvBq5ah4+lmEDXty+/30r13xom+Qb1rOEgAM1VZTth+Wr+Hpe71ZCEmjJe3Weu/jrmle/MjvZahmMl1XPMYoW9KkZTCqd6zglW1y2V/nDxWjc8PK5OtmRffdAVGEkXaU0ErzzgOdwif76mfTfKPNDjhcXyCDiq0bgM+N1Gw0gIKhVYtwJRcf4Xe+8dJsd13um+p0JXh+kwOc9gBhhEBoAEQQIkRSvYkiXRyrLklRUc5LCy11e61/ba67vO0rXWa2slWcHKkmnLki1KtBLFJJMECBAgARIg0gCYweTY0zlUOPeP6u7pCUgkMBgC9T4Pnp6uqj51qgtdVV/6fcjTZ7FvuQGnaHPy413UPaEg334743cI2pUdFGIKc30KxVqb1scdRt+8hvojLaiPPYPW3oasCYKqgGmRXV+P8f2nAci9YjPGD58m/8Yd+CdzqOvX4gwOI/p6YGwSbU0Xg7/YQcNhE2HD5Dad+qMWvjmLie1+ur5xirHf3EnLwxMMvbmF9sdSFeNCa28Dw4cCWE0RUt1BaoZyaKfGsKdmUAJ+nEwGJRhEFotuUag/CEJBbaij+A0d56+bGLnHh3AE+TYTI1GLs2kHof4kx38jSt/v7MX34/1MfWAndSdOofV0gxDITBZVVeDCTdM9PDwuA2WVOelIlFKPHSeTRega0iw9PCx6SBeG4dZrLdPY9KIpp3RdTDNB59If95eV4l5mHPsCDo5yEfmLmYOHh8fFU7Reuvf/pSpe/cdzYy95Dh7LY7+Mc6pWhcEhNA2ttcMtdgz6yXVGyLToqKYkMG0hu27ENgRjv1jklzbvY09HD2dGGtjcPcbo+ghzAzFEXQ6Z1Jl6Wx5xMkTmbgvtrl3UDEtqRk20tIk2mWTiVh1xyy5iJx3S7QrOzbuw/SDsMGZNDf6pZoy4xEjFCJ9KoWVh8C2S9T3jvLPhNF/Zeyfogg3dgwTeZJPaA4m7ojhJi9O3KGgv7KLzxymKhsbcOj/+OYdcvYKRlBRjPljbihYOIdJZN8XCshChECLoR+YLKAE/Mpcn6svxns9+k0+ffSWGanFizxri613DbOLWWkIdCTI/6mVkoAGMIrOv2kbzfxj44xZa2kRNFjyDw8NjhShHNmBhdEOauA09SzeJsuIUsLAbOa6TwhoZdSOj69cgBkaZevsW6g6nUY8N4vR1Ygd0kr0BtJxDuD9NurcGX8LCF3c9ioWGAEKCfyiBVRdCPXAMNvZiB3S0RA6rNog2kYB4AlETclM+t2zADhvucimRhg/p1xF5k7GfbaLtB6PkeusJnJomvaWZVKdKaMIh26gQGTBJt+sIB4ykgxE3UQo2UhH4zkwiQwHk2CQyl6sYG0ooBOkrfUY8PK5P3vX5p3jqj1597g089alzcnQseeGNrjKfeOgEf/yGzVd7Gi+KVWFwSNPEmZpGSoksFPC94KYblKUUAZQbNtL7npM87ahonGVjn4Z5cowmfYbGKu+eWl/npi8sgy0EXX9+BjUSIf6GzbT+7W73AcCRqH09iGwemc7gpDNIs4gEmg+A9us74deG2Y2P9bjREYmbCdW3uQ45OLJEHldvbqLu8UmUmzYS/M4xlGAQEfBjz8y6aQVCIDQdaZmulGbGwMnlEKqK2t7K+Cfa+My3pzHf0MLgXRo9/31PRbkLQGy/AamEWL9vH1pLM3Z7A0ouU4mweBmOHh4rhBCuShWAKClU+UoJU0pJxSoSRvp94EisuhDFWh9mjYoUMLfOrdVQTcjXd2O3FDBO+bE2+pGjkvirNTS9G5/Pojk8QwxJ2vTREJkllawj6yjMWSrRQJ7xOQXHETh2CGfcD+/YihO2UJIawgngGBKpNSKsJqixIOX2C2ronWXmeCvRdXEsRyE1UcOGvlESR2163pkhlRNEa5K8cLSRxtY4E881YLUViO8CRAHfgIFUFILjfqwAOBqIO9dgzEmkaMIKCtQ8ZLok4dPA51b4HHl4XCdcMJ3pZewhv9KcmHgJ0eYV4vvPjb1kg+PTj/bz2i0trGuquUyzujhWhcGBAKdQcOViqygbGwDO4WPzf9+1FUodwqtTCdR1Pdj9ZyqeQnXLBkQijTRNN+Rf9jLW1xL556fcz5e8jfbx/nNOr/4f57uYL85fLj/gVw5l+w3I/YcrKQblHiFONgslz2fZy1meu9DUildUWhbW4BChwSGyb7md4Hf20lPK666u35D7D89/T+MTMD5RyY8WhuGmeHheRA+PK4+U8xGOUlNPaVmV37liGNij4ygBPyIcRtNUtKkkzMzhrOsg9tgE2Vu7UQsOatYisS5Iqhtav+kjvkFBjAVo3ptl8I01zJ6JUf9cFtkVYNBsdBW2m1Rqxm2MoSzdUqJkCziGTq7dh5a10eN5EpvCRE9mEKZNal2YwESRTLuBf9bCFy9ghiPECnmUrIZZ5ydWI5jo7KZ5xqH/VB/CBnG6kWiPhvZQPc0FByE1/FNFhGmj5LNIQwfLQTkzjOxqrXw94uwYsqMVJZ4EXWP6Fe1X6UR5eKx+fusbB672FF4Sh0cSV3sK1zSjiZdWH/PosUk+/uPj/P1DJzj5V6+/TLO6OFZH9xCJawyUHuTVxnnZRDUWXbK545uXfavurm2XGuRZpQZ/xcYQ1vDIkvxi68xgJef6XCjhcGUezj3bqna+NAe5uqu43H8YsUghavG+Fr+XZrFSYF49ZvA7e0uTWV7mzrl7fl7K1s1ovWvc8QqFijqOh4fHClKuqSgWcTIZV7q29HuUpoXMFxCZHM7gCGga6uQc2Vu6UYoOxmgSqSlEBvIIGxK9GmoO9JRk8tYgalZQiAlOvy1AMSxIrFEZu1Pgn3OY3aAyeneYbEeQkdc2kO0KoWVtMq06Vswg16CQ6QwytylCIaJgBVWKYcHsRh9nfy7MyM/4yDUZTOyMkm7zMbVVQyogJGTbHPJNktE7VfwzDtkmhbE7BUpRkur2k2sNEL8xRqonhNQVilvXIg0dsy6IyJsUt65FyeYx1zSR3dhMwxNefreHx7m4GNna1crJiRRv/OQTV3saL5rrodnhJx9xHdemvfKRrtUR4YAFKivlJnkAIhSC0ns1FoXmRnhk3gNgJ5OIW7cgDxypLEv94h2Ev/kUvuH4AlUUtaG+0tVXBAII06x09i6j3LQR57ljbiFnqZhTfeqFJTJxzj3bUH76rPtGX2RAFAoo4TDCp2PPzCItC3XDukoUZfE+gYpWP7gGycTruqn/Ykkaz7ERhoHa0rRAjUo/PFA5vsyaGgL3vzA/v2X24eHhceWprucAKnUa0rIgk4Epd7k95f5hlH7TNvMeoI7HL22fkaq/q/Vpyu6a5keXfqZ+mXGCpddqhanwvyzcJgo0l/4uu1aq96ni+pBU3GNST7h9PsRpN1X2xevveFxphBCvAz6Be/q+IKX82KL1HwZ+Dfc0TgG/IqUcLK2zgedLm56VUv7Cik38GiFvvrwdhfuuYMPCxSRyJtGAvmL7u1ZQrqJRtToMDjGfGlVteKi1tW5q1KY+RDLjRi6qjBHl5k2IwVHsA0dQtm7GOeg+cIe/+RRa7xpkIo31qlsxnunHnkvgpNIVo8O+oRex5xDg9sSwhkdQ1/XgvNCP1t6GUxtBnB3FTiaRhQIjf7CLrn8bw2yJojxxECugkX/3HdQ+chpr4KyrwpLNYr3qVnxPn3C7pW9Yh0hnUNZ0VowNofsQfgNM002rsiw3BUoIhD+Mk0ohLYum7/UvlJDcvI7JbREa7ptChIIUtvbAw67hVY4Cqet6YGYONA2Zz8Pqr3/y8PC4CJK/dAe1h+IwHcfqbUUxbeyQjm8oTnZ9I1IBJAhHopgSxXTQknmE5cDkLPbUFEo4jBKLLlCeUvt6sU+eRm2oh8Y6RDyJzOaQa9pwnjuG1t1JamsrwR8cpHjPjaAIHE0Q3Hcau7eNQr0fLWejZk2E6aAUTKSiIManEJEwFIo49RGUmWQl8uymnV2d79Hj3AghVODTwM8Cw8DTQojvSSlfqNrsWWC7lDIrhPgt4G+AXyyty0kpt67opK8x7n0ZRwcA+idXLo/7L//jBT7+jpsv65jp/LXvDvEMDgmUvgOlLoZMZxCqih2Po9bWLu098aYdBL67D+fQUXfBjhtx9j0/v/yebcixJOk7ewjcvw/R0Q7JNMq6NTA0htrYiL3nkFvIHQox+4pOIveNUOiuw5iccW+MI/ONd7TuTrrvO4vVWot++Aw24B9L4/vRUbKvvw3jB5OVGgztkQM4gNbSjFWuC1FLfksh3GL0RRKWslAA3YcsumMofn/F+1lGGZ6kpimAk8/j3LYJ7eEDKDdsxDl8DHsuQc1PT7rfV32dq4rj4eGx4ghNc5t5CgGq6qY26jrC7wez6NapScctLg8FKfY0ISwHfXgGuzmGFfahT2UxG4Ou2pOqcOrtBpFTCom31GPV1BE9DlZQ4I87TL2xldY9DqkOFTUvMcOCpmfy2IYCMT/pNoPYEYFaF8VsrCHZE8DROynEBMFJBy0vyfxcC62PzZDpjZJpaqBmzCKxRodX7GJui4WaVQi3b8cOQM2wg1Qg+8b15BoEWhaMhIpq+pAKhMYKqFmL7B1rUSyJniiiJfPYTbUUbmrHN1ck32TA/d+62qfKYyk7gH4p5WkAIcS/AG8CKgaHlLI6VvYU8J4VneE1zskVfGC/3CRyJl9+cmDF9pe9AtGgP/rO8xfe6GXOSkahFrMqDA6hKiihsOvlDwagPgamhQo46zrQhg2ssXGUGzZy7CMhIgc1sr++k+Ib5sidiBEaErRPdjG3TmPuw7soRkExAwQmJcrP32N2AUUAACAASURBVIbaP4O4ZRPp9iCiN0qmRaXhKwmsW9YjbImjCRL/5Q6SPQr1kU3oGRv/mVnw6Qz+Qj2RAYfZGwTrvjDK2Ls3zyuuvGsnTlce57V3EBpSyKyxWXdfHm0qRXZtPcLqwL+/HzviR12/1pX9nUu53cInZ8C0QNew4wmQjqvfLwROLs/4/Ztof+8Ig//1BrLdFjX9Gi1P59yeIf+xj+E/2kVuQx6Z24F/VMOXgvrnCyjjGYTjoOQKXoTDw2MFEZo2LwihqlBOpcrnK+mZ1RFc4nGU4RGglHI0No5qWTi4+Szg+mH6zpNeFbnPfQ0uWl7+fIR5xTrlOMSWcaCGcFOf/EegXEnWtOh1OWrPsVwyPw7MN/srp18FL1A/53HVaAeGqt4PA7efZ/tfBX5Y9d4vhNiP+9/5Y1LK+5f7kBDig8AHAbq6ul7ShK8lJlaoWd7FBhdnM0XqQovbk56bz/701IubkMcl8+zZONu6znUFPjcF6+qm7K2OonEEwu93U5LODOKEDGTQQHa1oQ5PYc/GEbfdiHP4GIF+Ay0nqX8+g72vFj0lyLVIrIGzhM/aWEGwQg41wxIjIQk9NwKKgjqTwpewmN3kFkM6OzYjdQX97BRmEMLDBWInHYw5E1+8QKGrllxnhMC0JNWtINdmyGxuwqwRpHttWh6fJXoCAgcDhM8oNBw28c2oDL4+CD6d4IkpAkdGcNZ1YtboONEgjt+HDIeQZ0dxkmlksQi2jdA1twGgbSNUFa2pgfp/CDHyKzfQ8dHdBIY09IxEFB20rI35c9sp1DlEn/az7htFCuvyGLOSwMlJ7KjfTVkomlf7pHp4XFdI28YpujLXONKNYgQCKH4/QtNQ6+tQQgHU2lq0DlepqbxOCYdRO9pQbti4YEwlFFr4PrjYtHj5oba3Xngjj1WNEOI9wHbg41WLu6WU24FfAv5eCLF2uc9KKT8vpdwupdzeWCUQc73zP+4/fOGNVpBf/9r+S9r+M4+trMFx7Zd3n5tf++qlnZsyJyeubgRtVbiapG0vSCGSTz9fscIryUFPu6Guzr/aXdmu46mF49R8ay+LVYWrM/LUM4O0PVb1vrS+8bNu+lS4vH/mv5jFhZVt35+fV91zC9d1/8h9XWBDjo2jM+9VqF4nTaBU3G2XvKF2oQDJJPr4BC0PuttVH3N5XmsfnB+n74n5YxWDQ7y8y848PF6mSAnSRlYp2VWnTy7oDxR3u3KWxR1kKrVsx/HF/X2qmwq+XKkWvvBYVYwAnVXvO0rLFiCEeA3wx8A9UspK50op5Ujp9bQQ4jFgG+C5vS+Sy9ke49h4ko0tkWXX/c2Pjl/UGEOzL/9rzbXKTKbIJx46yX97Td/VnsolsSoMDqEqKIGQq+bU2ow1cLbS9E/r7kQm05X6hKN/04sa11BMgREXdH5nnJE3tmCGoOfrQ8hQgExvDEcXhJ+fxGqOoo/GcaIh8s1BFEuiPXwAdf1aZm9rJDhhYkxmQAgKzSGELdHn8jh+nXSHH7UoCR+ZZva2RrLNCu0PzjL0hjpip2xmN6oYcxA9bXL29Qp9H9pbUchSbtiIHfUj9jyPcuN60r0RQsNZHE1BOzmMzOURuoY0LZRIGCeZQqgqUkqUaIThd3TTsifN4Iclaz8SZ/itXbR8Yjejv7+L6Cmb0VdB3UGF6OkiZ96sET2mEjlrUfPMMDKTcVWvvJQqD48rjxBuNEMRC3oJSctE+HyVeg5ZNF0J7HB4gXGh+P3XjapcRTTDuzatRp4G+oQQPbiGxrtwoxUVhBDbcNs2vk5KOVm1vBbISikLQogG4E7cgnKPq8B4In9OgyNduPYLo68H/u6hE57B8WKQtuM2zyoUsAbOojY2Yva2oDkO1uAQWmcHxOPIjmYCAz46/2I3Zz62k9Yns1iNYdrv62fiTWs5+85OAlOShvtfgPYW5rY3oxYl4slBnLu2Ejw+6SpK3bARx1AJD+bxnZnEGhlFuXkTWspEPXQSJ5ul+IbbiD0ziRMLMfj2Zro/c5TZPygVab91F3raoXWPhfbwAdLvvANjSmH4v+8ifNZx55vNU2wKYTTWI4+fIWj0IXImeibnHq+iYM8lKj02nFweoaqulO7EFC1/N8Lgn++kOOFw7CMh1n4zg9bS7I6fd2h42j11vqkM67+ios6msc4MepKTHh4riRAl16RbeSHN4nzhuKa7dR265hoiimuMCFVBjUWxk2m01mbsllrUZA67vgb2Po/ziq34+ifAccje1IHxyHMoNSEm3rkRPSPR8pLIgyXBjKZ67P4zCN1H4h23ED2ZIbk2ROzZaczGGvSJJCKdReYLWJu60OZyFFrD2H4F21BIt7qOitDxGYgnoSGGHB5HiYTJ3NyOfyqPKJhM3V5L+KxJYCiJWR/CNzzrHrttk9nSQvBUHDk8hlJX66pglRqkah3tSL+PYmctZkhDy9lIAfzkap0wj3MhpbSEEB8CfoybAPAlKeURIcSfA/ullN/DTaGqAb5V6llQlr/dBHxOCOHgpmp/bJG6lccFWG0RhUsJuCTzK5/CfT30zLgQJydSfOrRfu69qY3XbG4+53apvMmx8RTWVei9UY2Qq6DNfUTUydvVnwPHrsjLLsuiLt/LUS6qvhjOty91U98SdazFaJ0dCyQmyz08gErfDWEYCJ9vYbqEoqKEgsumUJx3fyX53vNRLloFeEh++0App9bDw+NFEBF18nbx6qs9jWsO79rkAbB9+3a5f/+Ly0e/lnjs+CTv//LTl228r/7KDu5Zv3x9zJo//P5FjzPwsTdc1HbPDye491MrK+l7781tfPLd2y684SVwKd/NamPfH7+aprB/2XW//MW9PH5yesnyiz2/l4oQYtnr+yopGqdiSJw3R/kCxgZw0cbGhfZ1IWMDWGBsABVjA6j03ZCFQsWwqHQgd+xLNjaAirGhtZzbkl3cdMzDw+PaQmzb4vYZunOrW3RuGKi1SxVL1HU9C99HlqZYCMOoRFnPRXXhutrchHLTRrfH0LYtIATCMNDa21xhj7u3ufNpWK6toIeHx3JcTmMDYDU4kj1WlqfPxM+57sjo8jmsa/7w+xweSSy77kqwKlKqKIXGhGFUuvKWUUKhJYWTF6Jc/1EZvsrrv2C7csO/SAQ7ufCElKMfYvsNyP1L1SPUdT3YdTWw78K6zRWpzMXHVsrlruRwV0VwnLu2ojxxcNljssYnFkZTlpm/h4fHCiKEW4PlSHBshO6b77fh0xE+X+X3LzTNlf9WVZyZWURPJ06NH/n086h9vQjTothdj5oxkZqCMG2K9QH8IymyayJIRZC/wTUe7E3bSK6F9p9aJLs0Gp9JYUYN9ESBbHOAYCKNCIdwIkGy7SH8o1mKjQEcVRAcTIBlk11bh388ixPQQEK6O0BopECq28DRBL60g+0TJHoUoqcdihFBvk7Q9YmD5N54G8UaBdWUBMYL5Bt9+HdsJh/R8M11oibyiKlZRDCAVBWk30CcHYWVu8d5eFx3xLPFC2/kcd1wPgP0jZ984opFOhazOgwOKd1u44seyGGpSsvFUG1swLm9/uWIwXIP6+Xox3LGBoDdf+ai53Ou/ZejHJWC0aoITrWxAUuPaUE0xTM2PDyuLlIu+J1Xq1NJswiLr2PVv9mqaKp98jQAysDZSg61BHRchTvjBbcw3V+6VojtN1D/Bfca1QAom/rQ9h9G2dRH6MAU1tQUlBQAjYPzY1EaT+vuJLjnBPZcArXkZAnvcddHq/p/qLEokTnXSig7hhzA/8A+AiWHilpfh/KEq8RlKCpCETiWVVXn4uHhsRjbufy/jaHZ3GUf83zIS6r4uDxc7gqOTz/af5lHXFk+9M/P8IabljccLvRfLJU3Cfv18290GVgVBofQNNSGJoSmUexpwgpp+BJFlLyFWetHKTrMrQuQaxR85Ne+zf858UrmBmMQMVEmDaQmMWYVcr0F9Akf4S0zxAdriR5T0TKS2Mkc+uEzZO7agOMTJLtVbB/oGShGwJcAPS0pRgW+pETYEJi1yDZoBKcspn41Sy7lR2gOdY/4yTcItAzMbSuipDScgIMIWvhP+smvLbD+k3nsGh9mSCO+Xicw5RCYtpCqwDdXRB+bg1zetToLBTAM91Uo4NNxOps48Ts+dL/F13d8kV/+p98FAQ2HJOl2hbqjJkPvNbGTPtSwiZwwiPXNYv+kgYbn8uhzeZTpxMIWTh4eHleO6gJGoaD4dKSUyGIRJRh0BSGCAdA0UBWc2TmU+lpk0I9ZH0LYDo5PRSqC+AYfNWM2iilJt2qYNYJiDOTmFP4nwiS2FVBndF79ikM89pMwxV/ZQVvvNONHmxDNeRi9A8cvadpTj1pcw9w6FS0PxqxEsSRqUTJxh0DLCNS8QLEgOC6RCuQaBHYAcp0mgSEdMyTRU4JinYPWmaGY14lGs+QKOvJ4DVaNxImZBE4bWCH32qllBL6Ee22NnnFwVLACgsCsg3AkgdEc7P321TtXHh6riL/6/tHLPuam1uUVqq4Uv/n1Ayu6v8tNwbL5+I8vTi54tXI+n86FUux2fvQRDv/Zay/zjJayKgwOaVnYk1MgJVoyhVosonR3YJ84VemYG6jdQe1X9/GpxNto/Mc9nKtdkLphHbk1MRp+vHfhioZ6Qqfi5LpjtP7DQWShgNx1M2L3IfL37sD/wD5yb9pB4Lv7UG7ehHPoKAag3LyJjrcdxbl7G1oih7Ay2C+cQK2tpfVfNbK3rcH4gZt/qbW3uYpXfb2IYwMomQztT0SgsxX7hRMgJerm9dhDI643dJkieGEYiFSavve5EZbf+uDv0j5g4n/qBHYySejOreiDU6x731TFi6puXo/9wgnM14TRnjyMlI7Xi8PDY6UoGxtSQsmzj6K4TT01vaJAJ0tRDbWxESEEMpND2A768DiiLuaOky/QPBnDOexGMA3c1E8sC2tyGrW3i/Zv5UBVGfxjhzXFpyp3mo3tedBUZDqDzOUrUdpKbyJFRagqakMdNd9yI6bKDRsr+6pO25Q7b0brHwBFIPzGOXtnaJ0dyKAfOTSKLBaXRnOFQGi6KwccDCLCNZC7PiSAPTwuhi89efHZEhdLyDh/XdbFMjqXoylsoKnnL/cdTby8f9PXegD2Qoe3UlLJq8LgACpnXOZyrgFyYmG/oHIxeP0/7jnvMPbxfnzLGKr29AxMz+A7Ov/li92H3LEfcMcOfNd9dQ7NexzKfyuPPzvfhBCwS427jB/MNyy0RtwGguW0CCilOx2ZT5+wXzgxP8jiInghXEOoalHD593jLW8pnjy4RPq2PKb+0IGrENj08LjOqb5bOTbScZuZVi8vNwMUmoY9Pe06H2JRrKkptJ5uyBfIb2rHODuLHfWj3LoFK2LgG0kwcU8TmVZBYLIX/5zEl7KZuE0nNNRFrklQrJXUPyexdYGRdK9SxpyFnijg6Cr62SmyN7ajJ4sI02Fqcw2Rs21IRZBr1AnFtqLmTBxbYm1qxzYUzBoFI9RFMaqRq1eQage5JkHzPpP4Bh21IPHHJY4GigVsbaD2PweYu6ub8EAW21BxDBXfVAYlmQXTcq/rE5No3Z1eDYeHx5XkMj0I7PrYI/zqXT38yRs3X54BVynXusGxWh4MV4/BUUIEAsiyqpPuW5ALDTD24V20fXI/+dfcjPFDN7Lg3LMN5afPorW2IMMhRKHI9Cs60LMOoX+bj3SUx1PXr8XuH6g88C+Wt13MxUrtliMNam0twvBhjU+gNjdhT0yihEKIYHBBR/UllP7XK6EQQtew5xKM37+Jjt9JYQ0NI7ZtQT575JwfV27aiBJPg2m6nsbz7MrDw+MyUh3lqH5dRHUEwC7VRFhnBgHQxsaxAdHv3h9UXEdD/YlTLNZ86lpGvXGxOIbEzXO2AF/JGSKB2ipBnHDVthL3hqDhRlbArfeY16hyafnh4h27NRoWUPOtcSSu/KGiqDiOvcBRA16ncY/rm9390/zL00O8ZVs7r9zYdLWnc0F+emKKP7nak1iGy9mGYzSxsjUvK42zSiyq1WFwCIHW2oKTSCJam9Dqa910g3DIVYI6eAyxcR2z21zpx1N/dSuNz0j827Zw+h0Reu5Pc/Z/7qJ5n8nZd9ls+p/TzNwEvjkV8707qd89jl1Xg5LKM3NbA7lGQXC8kdCYiWI7cHoK5aaNJDbFcHQIjZqk231ET2Wxghqjr1AJd+6k+LNJ9IejtP3bKayuJvrfHaLv95/hxN9tIziswo4EdV/bQfjgONgO5s9tR5nMIrqaEXMZzOYIuiJcxZZ0FuHTcZIpN/Ui4MdJZ1AMA1QV3/0+dtaN8tl9IYY+CeKxXbT8/W7i79+JWpBMbxPYzQV8A36EA8KChiMWNS+Y4PchUpdebO/h4fESEArCp4EjET4dFFehqvxALvx+N9VKVZABA8fvQ42nsOvDKOmCu00yTXZrF77ZPErBYvbmmOs4+bbrOFFjUdA0N2LLvEJd2dgoO0+E7kOJRd39AzKVBp8ORRMnl8fZsRl9bA7r9ADami4QAuvMoNuoLxx0IxJ+AyWerDRGdQ4drRSMq+vXzkehpVxg7FRU9xy7cuxqYyP2zCxqNILsbIZDK396PDyuNsm8yS99wf0tf+/QKP/jDZuuyH7O9Xh5YuLS5fgvRNFa7FJ4+fHqv/3p1Z7CFeUK6BK8KC5ocAghOoGvAc24/48/L6X8hBCiDvgmsAYYAN4ppYwLt/3jJ4DXA1ng/VLKZ867EynnVZiqU6mmpqCUnSQPHyN2GGLVHwN6nnX/7ioFIPp+6Hr01t0XrkQDHMOA/gI2EDt6koZSrUUZC2AIws/Nj13ejw70PlR685mq7ccnWLfPnUPfhxbWi5R9jPrQcMW7ZwPKydJr+YZcTSnf2i4pdeXugUcIsalvdkGKVu1X3BSryD+zLF7thsf1wopcmy4WKUHaSBNXDlc6S9KqypR//0owiOM4yIGzC363gafy2HMJHEWlrt/vPsiXHujLUZEKqpurXX7YL0dqpVmsRFPVdT0IRVkQxa1OzbQGzs4fRj4/38FXiMp1spxaKotFt0v6opTX6shK+domDAOhaTiZTGUudjwO8XPrxXt4XMvc8zePLnj/l1egYPx8/PwnHr/wRovon0yfd/3p6fOvX+3kzWvnqcl2JKqyNPRjv4wiHBbwESnlM0KIMHBACPET4P3Aw1LKjwkh/hD4Q+APgJ8H+kr/bsd9TL/9fDsQqoISCCELBZRwGBpqEfkiMpWiuG0t6mPPoobDzLxpC7Gvz9dwaL1rmLq7FQQ0/ui0m8K0eT12jQG2w/Rv7CQ8ZBF47Ihr8d9xE6JgYxkq5qZWHF0heDrudgS/dQvCcuD4GZyiSfbN2wkfnkbkCjh1YbKdYVKdKo2f2cPkb++i8WCGVHcAX8phaptGx8MZ12N4ZhC1uQmnq5li1If+0AGUmzch8iYimcaennUlJGNRZL6AUzTdIlNK+vw+H9K2mXrXTeSaBGvuG6J5T4Rjn91CzZiJljYpxnzYfoVUh0rdsSLGRIbk+gixAxNYpwfO2XfEw+Ma44pfmy6aUh8OF9X9DToSJeBGLJVIuJIDIAMGmmkh0xnX+9/dgdkUxoxoJLs0ImctBu8V6LMqUoPYMZjZ5tBwQCG+CfxTgky3jX9SJd9o459S3QjnYYu5tRrBSYdMq4JvTuJLS3L1CsUoBKY7iQyaTN/gQy2CmpMEp21SHRq5JhA25LuKhE76MCOSwIQg975miq0mAIHTPoq1jqsI2G4jFUn9MypKEfScRM/YZJo1jISDL2GR7PZRezyHYjloY3EKvY1oyQL5liD8h6dS5XH9Ec+aK7KfD35tP0f+/HVLll8JCd6XO3d89OGrPYXLxicfOcnvvWb9kuWrJQp1QYNDSjkGjJX+TgkhjgLtwJuAnylt9lXgMdyb+puAr0lXh+spIURMCNFaGmf5fdiO229DUZd4wLTHnwNVRXa1Uf/MbMUTqNy8CevQUWpPDzD9GzuxxieY/o2dNHxuDzO/vpO6I1kaPucaJw6lOo2nnquEGnUgf+8OmHH3JQ8cqaxT/H6C/74XG1Ab6kn3tqMWHRo/447X9A+7AYjshvj7dtL5F7tdb6CUqJv6sI+eRMzO4Q8FsKkqPC917JW2jZ1IzhfKSzftQNo2FAoohoEvLan/4h4sYGIn1LLH9SzOJQiuX4sT8gM16A/uxwEip8NYpdoXz9jwuB5YiWvTJUwGaduuGpXlFkgLTasoRZV77gBL1elmZlGOu3UTZfW9DT9e6DSoK70u7ieuNtRDLILdfwZ1XQ/+B86grekiUopaKMEg4WwWJRhEqa8D06T1wYkFYwTKf1ygX4ZaXwdCYE/PoITDKKGg6+SJRLBTKbTmJozJadTaKPbMLHU7b0Y7MYQsmlipFL5EEjuRxFgl3jYPj2uVTPHyeu0HpjOsaVhczXV1uRy2k+NI5lbICFwJTl4gGnW1uaQaDiHEGmAbsBdorrpRj+OmNYB7w6+uChwuLVtwUxdCfBD4IIBfrUHxBxHBAE5PG0q2SLovippzCAzMYdWHKMZ8WEGFTV8KUXBUntwTJXZsJ7kmgTELA3+xE8UWDP2PXehpGNsVoiO9kXxbGGMii3NiAOfubUxuD+CfkdR95zBSgalfWI9ZI9BTbvPByGCRfL2OMWeRadbxZRwyzQpzW0C+ZTsAXd9VmN2gkWtxiB2F/v99B07UQklqtP2nJBi+EWU2jVNbQ7HOoBDTiB5PIoVAKVowMQ1FE3QNmcvPS0UabqmmCAVIvSvJzFtvwhoJ8taf2ctjn7oDgEyboOPRLMM/E8RIgPlf7iDXpBCYdKg7GIeiiTAtZCIFs5dydj08Xr5csWsTwYvZuftS6jSuRiLub7tooqiKK4QRqUEUTWTQj+PXELbErPUjFUGy24eRcNDTNpkWHdsAxYRCraBQJ/HNCTp+NIsdMRCWgzaZwGqKomaKzNxSR2Qgj9IPzMwtnM+tW8i2hdDTFjZQjGj4EhbF27sRNvhSJmZQI1+vImyQKuQaFYQNjgZaVmKGBfUvmKg5B0uAv38StWgy+c4tND/ifm3lxqNOKo2yuQ95ZgitvQ3OjJc6rzuo63ow22OoySJW1IBHvQiHh8dK4ryEJ/Rf/tJeHv/9Vy27Tlz2FnwXxwOHRvnku7dd0mfypo1pO5Umd1dCkviqssp9ORdtcAghaoB/A35PSpkUVRIBUkophLikQ5VSfh74PEBE1Eknm3XrGKZnsIFASYzJxlVaMUr/zpbuU+t46oL7cADfc/MKLMrjz9Ly+Py6wHf3zXv3qijr1pfrOIKwpO9H2wPzfy9WkCnPG8BX+ucsWr6Akhe0whS0vWX+7XNAHW50pezp7Hxyfn30fGN7eFzjXOlr00V8wH0p1Ww42awbnRBuV3CRzqDkC1jxuKtAZ9uuMISqIotF6h7Oo3V3IjNZ/Lm8G+1dhMN8Z10LYMD9vceqROvKUt1l1St54Aj+qn5cgUWvMH9dLRPl/JRjLvVf2LNEntvJZKDU02PJMfSfQTklkFJitDTj4eGxsvT+0Q9e9GcL5rlTcv7vb718FCDe/OknOTaeYuBjbkfula6hudJ8//kxPn21J3EeLsrgEELouDf0f5JS/ntp8UQ5HUEI0QpMlpaPAJ1VH+8oLbsolHAYYfhAVV052WAQHAcRCCCiYeyxCWSpsHrB54JB90a/82ZQBNkWg3xMwZ9wiD45iD09i9rZhjR07KMnAVC3bCDfHsb/1AkwDKx1bVhhHf3B/eUDR+toxxoaRrl5E/mWEIGzifnPNze56lIzcZxUiuJrtyM1gf+h5xB9PYhCsVLwXU6HKkvzltVe5g+glP9dSrVQa93kCTseR92yAXl2tJKWoYRCKI31JG9pJfjvbsF67s07qNkzAPUxRCaHzGQ9WVyPa56VvDZdkLLhUU6FKheSO/PGgJPNLklb0lqal5WKVTesQxSK5NY34R+IM7etkdrdw1hDw256kyPdAvVSZNSemFxWSnyx7PeSa8/i/ZauVeVtlVgUEQqS3tKE/4F9aN2dWINDbr2daS4VwCiljJX3Uy6SV2Mx7Hgca3ximb16eHhcTvYPzLJ9Td2FN7wIJlPnvl48P/LyaapzbHw+tfXwy2je1woXo1IlgC8CR6WU/7tq1feA9wEfK71+t2r5h4QQ/4JbkJm46BxpIdyH6qp0Z2nb7s0xn4dEsvJAroTDlQdwtbYWOx5n7r07iX1tD2pDPaEnZyr68RbujdM6PeBu39zkpjRNx9GPHJ9vqjc1hV49HykrN2rn0FF8h+ajCGokgj0xWb01vh+7horEVdWqxp5LuMbLsPt8U77hVwq8FzUBLD+gjP+3XbR8YjdaS3PleJ1Mhtm33lQpoFdu2kjg/n0Qi8LoBNKRldxxD49rlRW9Nl3chEAoKD4ddN1Ve1IEIhQCn45UFczWGErBQmqKWw/hV8mFVfLRXoQDii0xA4LZmx2k4RAY0vElIX9nM03PWuQ2tTD+/i5qzkqEhGKNIDxqETqToritm+DRcWQoQLovRs0L08iQn2xzELa0ohQdUl0+pALhYZPpG33UHrew/QLbJ0h3KBizrjHkS0nSHQpaxi08Dw8V8E/kyL15B6k2jcBsO5YhaPzxaZyuPrTRWZxYGFE0MZvC6JMp9/uYnnXlcDevx6wLglhDoU6H+72UKg+PK8nbP7uHhz9yD2sbay688XXG//rxcT71aP/VnsaqIpk3CRsa4nI2OFnExUQ47gR+GXheCHGwtOyPcG/m/yqE+FVgEHhnad0PcGUn+3GlJz9w0bNZpphwgSeu6qG8ugiz/HAe+1qpK3dJo/5c4yw2FF4M5bzlS6FsbFRzoQLvlk+4BeqLvYLVal3Oc65xs0Qy08Pj2mblrk0Xg5TzeU+OAwE/WBYyk0H43MijPi5wagII28GM+cm06OhZiT9ho+YleqLI3Pog3d+3UQsOU2fE3wAAIABJREFUvrE4I69rovaYQz6m4tSrNB60UIoOiilRijaOoWJF/JghhUJPI0rBJvz8JNbpAdRNfRgzeaSmkOoOEjuRRclZTN0WITTqgABj1sJ/dISatS34BqeZfE0nWsHB8ivUjEiCE0XS7QZmwI9iQ/OeBNnuENEXkphrW8m0+4meHkWZAzQVxXIQyTROSz2KrqO1t5FvCSMciVQEwsv99LgOGZhe+d5Ys5kiaxfng19G4pnihTdahXjGxlJu+tMHATjz0dez78wsY4k8b97Wfln3cTEqVU/AOauCXr3M9hL4ry9xXh4eHh7nZVVemxwbJ196oq6uYyg5A9T6Ojg7Ao7E396C9ugIal8PlKSx7aMnCdbeRvDEFFJTwafT/pNpzLogas5k+pYIxmwR8eRBtM4OzK4G1KyFNpNGH5ekNzcQGDMpdtbiy+aQpoUyZyEsG3+Njj40gwwHCcw6SEUQHEojTg9jpTNoyRTZuzdT/1wSeeAInfeD+Zpb8Q3PIUUtdkDBFy8idYWakwm3OWA0QPTwrJs+FfTjRIMoz/UjO9uQqoBgAJnJIhyJPpdH6iraI89f0VPg4bEauRJN9y7E0GyW2y5TWlWuaBPwqQuWXY1j8jg/zw8nuLFjvhrvB89fWhD/yf4Z3vNFN1V/xQ2OFUHMd8xFCBTDQEqJUFWcbNatlVAUzN4WxJMHce7ayviuIJYf2p4oIBUIHBlh6N29ZNsdwqcUWn86A0NjyJ52xJkRUFVEpAa7LoJ89ghqYyPmxg58pyfIbG1HT5goTxx085cTSeQdN5HqCaBYYPkF6Q5Bx1/vZuQPdhE+6zC3XkEtgH9G0vTkDMlNtWRaVJo+vRtl62aUiVnsjkbU0RlkwEA40vU4RiI4hQI4roymEvC7nlHHqTQDEwE/yVf2Efr23kpHX+eebeTrfSimJNmtIRVoe3gG+8hxUu+6g/C/uEX02pouMC23g/mlB2E8PDxeDGWpWyEQmpuYKfTS5VUIUBSUUBBZKKJEIgifjj0yjnLjesRcGidWg9Tdm7n/wWexLAu1ob4SrVVwUzXrq4rAraFhRCnlsxw0CJw8jdB9qNLBKkVPy2mb+sBZN71U9xE6dso1jkqf0zo7kKkUvh89jVTmHyr0hw5gA9rJhTeLijz5iaViFRLguOtBtEr79x0F6mNIIdB6uisNXT08rhc0deXVnD78r4f46A+PcWvXYkHtS+dD9z3DF99/24Jlv/j5C4v3XElG5nK0x5aT/lnKtdTg73zc+6knOPGXP49PUwD47X+6tN62ZWMD3P4d5XEuB0KuAk30iKiTOwNvWFp8eBm4UIFkNdXFksuOtW1LpXv5Jc1B0xCGgZPJLGnKt7iIfLn5Tv72rkrvj/OhrenCHh4FoSDNIg/Jbx+QUm6/5Al7eHgA7rXpdrEkWOLxEvGuTR4A27dvl/v377/a01gRHj46wa9+9eV7rLoqOPGXP8/X9gzy9ls7CBkaa/7w+1d7Wjz++6+ks+7C8uWrYa4rxbG/eB3+kgPrpRx3X1MNP/nwPZf8OSHEstf3VRHhEKqC0tyIyBcgGsaOBRESRMHEDvtRM0WEaXPmHfVsefUJfr/jh9w/dys/+tydzG1xWPvtIhP/Vx7lkVqkiuv9fyzBwJsiNBxyCA25BdR2SCfT4qMYFtSM2cxs1vAlJbFTJvE+H81PJZjbvJnQhEm+VsMMCdSCZOIuh8hxDdsPuffeQfS4INsqUIpghSVWjYO/JQPPRgiOS5p2T5NZV4uaswmcnqHYFsOM6PgnsiipPMzOIQtFhKa5qRSmBT4dAn6k7SCLRRqfjPLG+kM8mthIMTFC/7o76HzYZuJWHTsgUQsCe2Ma40ANxYhEarDui+PYO29EagLfSAKOX+UT6+HhcVlQ1/Vg919YM15rab4iKlBlFUAPD4/rE9OW/NkDL/CV3QMcn0jxvp1rrvaUALj7bx7lzEdff95i58GZla+fuZrsOT3DKzc0veRxLncjwVVhcEjbwZmdcwvBJyZBUd2+GXUxtPpahGlhNkfp/eoIqT+d5nff8ztEv/EUxrsloUEVbS5P65tdPeXM22+n5kwaZXCc3n8qkN5UhzY6izU8ggrUrenCGZ9EbllL56CNks5jnR6g6UE3FcC5eSf+Q2fxtTVWohn1j7VhjYySv3cHoc8er0RB8vfuwP/APqxX3Yr/5ByTrwlTfzgNIxP4j55E6+wAVcEOahg/eBq1pFIldB/StlFrQjiprJtaFQxWuq0jJVO/18mX980BObLvbaNzysL44dN0lYxVtbkJZ3YOaRbRWppJ7lqD1RxFP3QKFIG0V0crew+P64ZFDQAXK8+di7LiXiWtdBkuxtiApeISF8uFIsGXw9goy+N6eFyPrIJkkpfMV3YPAHDf3rPct/fs1Z1MFR/4ytN85QM7ll1n2g73fPyxlZ3QVeYDX3660mtkNbEqDA5wVaeE7kOJ1OAkkkjLcvOd8wWswSE0o6/S0Cr6jafQOtqJ/PNTRJhvqgcQ+vZeJKW84ukZAsfB8fsr662B0o/kwBHXwMAt5JT5Ak4mQ92X9iBDIeSz80pWxd5mlJFR/A/sQ1aN5X9gnyux+8gBLKDuy8Pz+xbC1ctvbqr09ahI4pZ08u10xtXRl9I1NoSYf0jZ5xZWzv7KTuq+tMfNsW5sxJ5ym2vIVLoyjjOXqPTjqDziKAuLuzw8PK4w1U8U0kHx+xHhMLL02xZ+A2qjrlqTqiDSWWRNECcaBEVQDOpo6SJmxMA4O0tiWxPRh06QubOP0Kk4M7c1UHcwDicGzvvgXpbfrq4BEboPFLc+TkTCbiF3NEyxux5t/wnyd28meGgIDB92fRgUhVxLAGOmgFQEvpE4MpubV/gTAqSsRD4Wp4pWUzZmnHweYRiITWvh2cv2rXt4XFHe+bk97DszyxN/8EoiAZ2IX7/whzxWlMeOT7G7f5pd6xqWrOv74x9ehRldff70e0d46y2Xt+j7pbIqDA6hKGgtLaBpyKAf1edDxsJYdSHERAJ183pEPMnUb+4k1QtWrUXfl4sM/mYXbY+bpDp0jJTE1mH2BoF/SmAbYPuh9phDeDCHmixghw3m1gcRDvgyDmreIdegEZiyQECqU0M4EJh2CH5nL1rvGuz6MPENfmrCt5Ho0bENaHomj/rYM4jbbsRxHEZ/eR31L5gkenWELWl+PI4T1FEzRdJ9UWy9h+jBKdA1tylfKACTM2A78ylVigBVBUciDB+jn6uFn9RhhqDw4BpOnG2g8XGdxsdDjP9sK1oWZm4CLQdaVuCo0PpUAf9xV4df5AswcLXPrIfH9Uf5wdvJ593+Qbh1XNg25PLY2awrjhEIQDKFEvdDXQxtoog1OIR17w78tkPNt/bCpj7UvA22Q2DKQoxOIZobyd7Yip6xMEMaQkqKYZXQWAEtniPXUoO9tZ3AeJbCjl6sgELkaNzt/WFLSKRx1rQxe3OExoeHoLnR7TnU3YSayKHGM8hEkrk7NxJVQE/ZyJk4qVdvAnpQCw560sTXP4bZ24Lt1zBGEuA4mC1R9LE5RM6NllhjEyiGgXNjn3vM6TzXgKPX4zpi35lZAO76/x4F4PRfvx5FmU/feeDQKDvX1tNQY1SWDc5kSOUtRudyfPDrrtLDP77XK1m6kvzSF/Yu8er/5IXrt8noV3YPVCJSL4XLWTi+KgwOFIHMZJFS4gwNo3W0Yx89iQDsHTeiHDlN7u7NJNdJaMvTdZ+PgV8I4p8SaFkb0LF1iG8W1B2RJLsFoTGJo4OWlyg5CzExw9z2dQCoRUlNfwLHrxN8YpCZN2+h/vvHCfbXYp84hdh+A9Yrb2Gu06D2SBLFBMcniJy1yNeqzNzgJ/PanRgzgrb/tZu6th0UIyq+hKThP4dxasMUaw0KPSEiJ1KgKWTX1RN6ftTV5Z+dQ/j9YJuACrqGk86glGo4hN+g+FQd4lUJbFthLB4BVVKICfI99RSjgppRC1Dp/n/3MPAXO1nzJ3tQt2wATYWJKe+m7uFxlaj29pdFIaRtuw6FYhElFEKpjVUinsKyUKTEKZpoHe1ummZpLPvoSfSjbuTSOHGqErn1lzqTl+OY5UcdB9CfAx03RdQo/VuS3DU8Qu2zVPbjKzVFrd6u5e/nhSpsIPidvVRjAWJ8Aq3qc8rJpfuyk0nYf9idn6a534WHx8uA54eXisi878v7+Mf3budD9z3DcDxX6V79g9+9m81tEYBlU3h+558vTS3I49J56IUJfmZDI5qq8OMj4/zG1w9c+EMe5yVbtNBV/bI0BFwdBoeUiGAAWaqNsBtj2H3N6LM5LJ8KW3oJnJ5lw99mSN7Rzdk32zTsVkn2SGY3+2l+fIaBtzWw5o93k3nb7dSedBh9lUPLfyrYPuHWYsSiRPtz5BuNUkqDSqHBj5FMYvugeNMazBoNvS2KbSgEzsSpPzhL4ZZe6u47wOk/uxUjLmj7+G7OfHQnwoL6oyb9f38HalbQ9WCBkXv8mDUdtPzrcYITPoKGz019am1Cy+vYzTGUVB4hJRQKJelfBSeRQvh0pGkhAn6ssXGk1kvd12to/Ug/s3/SDVKiPuY+AHQ86ce6fRNr/x+3fiQ4Jpj9wE70rCQ0HkStDaOkcxC/mifVw+M6otRlHMc1LISqogSDoLmXWGH4QFXAskFRwLJQ6+twuloRjgNFCyVXgHwBrbUFe2oataXZjTyMjF6xaa90XYW0LLT2NhhesV16eLwopJTc+6knlix//OQ0G//kR0uWv/7/PM7Wzhg7epbve5E3vbrKK82vfe3lqwK2Wnl6IM6vf20/3/7NnWx/iT1dVo0s7oWkJ9XmJrAs7JnZFZrVyx9PetLD46VxWWRxS/UOC1BUFL+xoBhb3bIB+8hxtNYWrM5GePowzl1bAZCqINfkwz9tMnCvjlIQqEWoOyIpRtyIp560yDf43L5EEwXMsI4xkyfVGyJyMoUd0NHm8uS6wwSGUthhP9rzp/n/2XvzMDmu8t7/c+pU9b5MT8++aWY0Gm2Wta8YDLYBG0icsG8BAgRICPALyU1CIFwgl1yyktzkXhKCCUtYwhrC7g0MxrIsy9Y2li2NNCNp9q1nuqf3qjq/P6q7Z0YaWbK1jeT6PI+eruXUqVPdmqp6z/u+37e4cTmWV+I7MYkK+si0hSmENQIjRcyAJBuXCAWFiCDSbzLTqRMcscnUahhphZ5VzLZoeKYV0f4CxbBEz9iYfo3wgRHMU4PI2jiqthoxPIYqFLk7+e/uvWkJIoS4HfhHHOfZ55RSnzpjvxf4ErAZmARep5TqL+37EPAOHCfX+5VSPz3f+ZayLO6Xdvfz0e89cxl8F5friTdub+Ore07xhy/u5n23rrigY5a8LK5e3wCahioUya9roxiW5COSwLiJkSpiWTZ9dwbZdvMR+pPVRLw5xtMhACYGqjCqcmiaorl6hr6hGta0DTP09XaydYKaQxZGysmTOH2bQdWTkG4UFCOK0EnBbJsidFqgZxWFqCAwYlMIC3JxQa7ORligtWTY1DrAnic7MQJFGqqT2EowPB7F6y+SnfTzwvVPsucH62j/7iR2wMPgi8L4xhW5uMCbUEROFvCdmkaYFiiFms04Ho50BiEEqlAAw0ALBoh+u4CuWby65lH+vOdOkiNh9EiByAN+TJ8gua4AUmH4ixRnPciEQeSEk9PhmbUJnsrAnm9d5V/WxcVlUXka2zpL+cnqcXSszeERGB4BQPvlXHZ1qPTZdf+5TzVfjb4cZhV+xAmv0nBCrryHnU+B82aoPfA4GnOhUL4DUJbG0JlbLlNeD83bFp63XH6oGMyFbFmjY44CocuSRQghgf8LvBjHB7VXCPHfSqkn5jV7B5BQSnUJIV4P/BXwOiHEGuD1wFqgCbhXCNGtlLom4+eeGkm5xoaLC1TUyC5FHseSMDhAoHJ5p/BdUz1GqoDv+DTBqhA81YdavRw5PUvdo34OD6/B8kA6CzWHs5g+STRv4hnPc/LOOOlTQbSNMPmTZXhR1BzIYAxNOYnalk1gsA5v0qLuxyeZelE7Rsai8Zt9zLxoOXrGpvb7fahcDq26CrO+iqkbQkx3Q+TeIKmf+Am/1oNnxkAOGIzv8LDsV3ly8RD2WsHoyw2iL7VhIoGor6Z2fwHj7kcp3L4V3y960OLVYNuoTBZ7Nu2EUeFUJFa5fClp3Aavh2N3dZKtEzwcXYt/TOCtgs7/M83wLQEQEOnxUHMgT99v+PFOa3T8zWGnknE4iMjmUdns1f5RXVxcLhGythZiEazefuTyZajBEZASa10nxsgMVnXISco+NYrdUosoWnD8tKN+twhaOIxWU411evCc6lJlynko1os2oacKzmTJvh6Kt23GM1Nw7lmAsBRKCkTRQgyOYSdnQdnn7d9lybAN6FVKnQAQQnwduBOYb3DcCXystPwt4J+FE9x9J/B1pVQe6BNC9Jb6232xgzItm4/812GUAlXKTnSWmbdcWluwXZ3R5uztzDu2aKnndJKxi8vT8cTw4pLtz4QlYnAosCyE14vV2+c84KJhNCnJ3rQW36HT2LNpIg9mGX//cgr1RbxDBkMRP0qDquM2fb/hYdU/DHD6lS10/ccM02vCTK8U5GJB6gsmcjIF+QL1e4OMbwzCrmUYGZvgPT30/eF6Oj97nFO/tZxAfxxlSJIdISJ7TlPl1Yk9aXPsnQaTz2tg9QcOM/r6tXjv20/+DesZT3uZXVlEn9SZeMVKgqMmVkcD2sFe7Oa1ZO/chm8sD93tTj4KoE+k0EpJpOVZThGNoDJZhNSgUGTipiLdb3+U9Ku20/LBYzz+85VklkXJVUPHtybofUucqVVeVLCA97hk4HfXET9cJLC3H0IBVOLCqqu7uLhcJKX8DaEJJ2HcshEew8lN83qxUym0WMxpqkvn734qgdZQB/kCKuBz+piaJrehHX/vOCqbI39DK8JSeEZTZNqrkHkbj28FiRuqCNeGMUZmSNd6yTTWI2ww/YLqCT9mxIvpkwSKTeSWx/AmCuhTaRifgtpqRDZPtqsW70gaa8cNGFMZEuurifamGd8UouEHJ7EaYmDD5MYIoSGT5DKd2n0p0m1BMjUact1OAqMm45tChE+boMBIFck0+Qj1p8lvW46RMTGGkyiPgVCKTFsEI2OiT+dg/1X+zVwWoxk4PW99ANh+rjZKKVMIMQPES9sfPuPYRTU5hRDvAt4F0NbWdkED+9lTjndMIMrlbhBOX/P6nfsUiHnLlfNWlllkezrvGsYuLufil8cmLrqPpWFwKLCzObSqKOp5G5BjSZTXA6OTeEd8YFmYG1egPbif6NHlbHxxD/0dcY7vb8GuKZCd8VG3x2b0xc2kN2U5FYgSPqUInYb6ewawR8agsR4VCZGt9xEasvBNFDAm05x8/3oCI4qxOzoxZhWnfr2axt05fGN50jc2UwxLkss0qvcopjaZDL7jBoohYONq6n9qML7FBtu5eY2/oIj3hxIjKRDtLQT3nya9qRWx+wA2YN66Ge9wCpWYcbwZUkN4PI5iVT6PkJrj6aiNs/Kfshz9xx2osElqvB7LAzOdBnWPmxz97TgyD+kXzKLbGkp6qH0sj28ohdAlajqJMJbGT+vict2jFCjLKalTms0v18ihNKFQrp8zH7ukDDUf497JShiSXgpBsgDvkdIxQOSgs2wC/jP6MAHZdxJZOs5XOq4S11Kqy2H0n8KGSihV5AlnBrhmbykMqpSoXl2K6KrB2R94dPGwrTKhM7bPj6fx9sxdg8tzF6XUZ4HPgpPDcb72utTY82e3XfZxlVn70Z+QLlyTkWAuLpcN7/USUqVs23HZj48jxscXyiqWHtTag86DMvbF3fR/EWCA5YtIncT/beF6+eFt959Cb6gn8N3eyj4LaHni6OJJncxJSpYfovNLyiggshciX1v8msrX4P3hSGWbft++hddWOq8oFJxwspKcpjY0ikqnaVq+neC3zk6oW/790uGlUIczz+ni4nIVqEyxaqUPUTFAytXEtUAAVaq7I6TEzuXR62pQponw+bCnEohgEDuZXLTyt6yKAqAKRewbuxD7nkSs7UKkczA+iWppRBkSOZPGLBkj+rJWzNNDlaKi5cKAYuNa5FgCdIl58jQyEnGkyVOpyvn0znaU18A6cozCS7fguXsfsjqGNTmFdsMqtIkEVmMN6vEeZFUUa3oGLRxG6DrW9LSjrlfaLmMxrEQCuaITjl6uH8HlIhgEWuett5S2LdZmQAihA1Gc5PELOfaaoOcTt/PgsQnefNee8zd2cXmOsLIhfP5G5+GaUam6ltCCwXPGTl9JXJUqF5eL43q7Ny0V3HvT0qNkQBwFbsUxFvYCb1RK9cxr815gnVLqPaWk8VcqpV4rhFgLfBUnb6MJuA9Ycb6k8aWsUpUtWKz+6Nnyty4uz0VetamFv3vt+gtqu6RVquajNzcBoNJprOmZihdAW78abTJZKZY1/Vs7qfrKI8y+eiu5mCB8ysQ7mSPZFSTcn0U8dAAZr0aEgk6cdC4HhmcutGHbOnjkEACyeznFujBK15A/fwy5phsA64mj2DdtwBiZofedDfjGBdVPFEk36lR/fjfFl2wh2WoQv8vJiyu8dAveiSwc6atcT9n4kGu6YWwSe3oG4fE4uRulasNnqtUIXUc21GMODHL0X7cSOGnQ9g/7ydx6AwDB3gTJNdVE9wxw7Pfb6Pr3cYZfXEf9w0m044NYCbcAh4vLVeFMb+m89TM9ki4uS4lSTsbvAz/FkcX9vFKqRwjxCeBRpdR/A3cBXy4lhU/hKFNRavcNnARzE3jvtapQVcbvkTz25y9m01/cs2D7y9Y1cPsNjbz/a48v2H5TVw1v3tFGPOTlNf9y0bnyLi5Linfc1HHRfSwJg0MIAUKgN9SjTBNrdAztxlVIoUFNDOvYCeeh7TEo3L6VdINO7AvOH3ToGw8TAoq3bUbrOUH1cAyKRSeUqqYaK+xDDk8h/H7seBUzL1lO9CsPI48NoAIBREsj4zfVUf353RRu34q3tQX7xCm0xnoAjKcGsZY30npPgamVXrw/3ksgFkO0tpCM6xVjI3/HVvynU9iHn0SriqJ1rcKM+pFHB9GroighEOEQamLSCZ8oVyOW8qzvQzY38sT/rKf64Ta63+30P/QHu2j+/jBWbx8qHCaazTPxojbq99hYT/VS91QvSghY0YkeCaFmkm7hPxeXK82ZHuN5666x4bLUUUr9CPjRGds+Om85B7zmHMd+EvjkZR3gFaY66OHJv7idux7s43v7B8kVbf75DZvQNMGvr2/ioeMTvPHf9vCJO9fylp3tleN++ccv4pG+Kf7wmwcq21pifgYSrnrk5eT4X76MomVzeirDR7/Xw+4Tk1d7SNcNa5oiF93Hkgipinrq1K76Nziz/80NiHQWFfCRWxbDM1NAFC3MsJeTL/OhFQS3vOwxfvaDTRgbE+SyHjr/2mTgz6F4JII+K2j/8knMgUF6P72DwJBGw+4Mmmlj+XUy9R5mOjWCQ4pctaD+kQxWQGdinQfLC8vu6kUE/CS2N1J1KMGJ18YpxCxW3pWk/84Y+XqLqkMaic0m3mEdz4wguzmDbljUfDVALqZR+50nyG3pQijwPTXC1POd0FbPrE3o8CjKY8DYpFNdPZdzDJFsDkwTVSii1VTj/1KafU+1o3kt9FM+pxZIUVAMKSy/jb95lrA/z+hgjLbvCYK/eBJrTTtapuB4TmbS/KTv792wBReXi+AZh1SV8zjOuK9qPh92oYje3IgKB6BowkQCtawRbTIJhs70lgaih6cY3xHHP2kxvlHHOwVaUaGEoBiG2FGLyTUSzYRso03kqEbNoSzTK3z4EjaeGRPPWJpcSxhjpoAV0ClEdPSMRabeIDhSxJYC/9AshXiAqdVeIidNAgOzTGyKYhsgc2B7YHKzReyAxAwIlOZsFzYoDXwJG/+EhZ41ydZ68E0W0ZN5tNk8uWVV+E7PQNFEBbzQewoRDkE2h4hFUT4vP33yU+69yWVJh1RdCnJFq1KVvLMmyImJqx9qfT3T/6mXL1j/7wNDZ3minkvc+8EXcNvf/+KS9HXmd/t0nCukakkYHOd6qMuuDqxeJzxJ72yn0BxbUAjrQpE1caypaWRt3ClABchIBCuZRPP5ED6vk4R5RmjTpUDWxEGpuQrpmqwkb54rWb08tovFjZN2cbk43ByOhWiBwCW5T7r3Jhe4/g0OgPY//SEAH/u1NXzs+0+cp7XLs+Grv7Od8VSeOzecrcT8054R3v3lfVdhVFeXn//RC2mvCfLaf9nNI/1TF93fpTA4lkZIlaaBkMhYFBpqsYIe9PEkamwSvaXZkcXtO0nyeY1MvWInVmOeFW95DAC9tYWpm1qIfO1hTn58F0oqbA9UPQnVX3gEsXEVYshxq1WMjZo4Vlcz+sAkoy9to+6XY2RXVGN5NbJxjfp7h5yaGEJgjjiFgMbfsxNfQmH6BbEv7EbvWIbZd5LJ39lJ3XecCsH9v7uKSL9N/IEB7MkptPpap0jW1DR6Y4NTQVjZyEgEO59HSEeTH8vCzmbRvF6UUljJJMk37iA4VED+/DH6/2IndY/bWB5B+OsPozfUk97chszaCFth7DvG6feuo/mv9yBDQcCRGcaN4HBxeU4iS3U/VDaLnctVtldCOZ8l842Ncl8Vdb3FxDLOMani4vJc4QXdtfz6+iZevbnFNTguA3/3mvXsWl5zzv0vXdtwBUezNHjT9jbaa5x3wbvetoV1H7v7ovr74Iu7L8WwlobBAaAFA06C99QMwl9DflkcrbEKYStsQ6LPJMnVCOIHbXIDPvJ3bKUQlWSrNcwgTPzVTqqeUhgZRaZWY7YFwrdtxJYCH6DNJLEzGSbfuZP453ajZbIkb1+HnlNMba118jpWdmGvr0EZOnZbPVoyi+poYGSX88MN32jR/XuPoLc0c/J1zWTWxOn+Pyn637MK35T0pZLQAAAgAElEQVRC2BA+nQdDRzQ3MLGznurvHsZe24E+WLIwlQJDRwYDTrVxqaGKRbRAwNlnWQhdJ/SOQUZ+2krTz8HsyGEd9jLdpWG8Yhvpekn1k1mydV7SDRq1P09R91gBzed1CgimUmhVURi7aj+ni8tzi3Lxv1L9G+HxlD4NhMeDqnbyuKywF820SbUHQAhyVYJkJwRGBMl1BfQJA9WaxeMxyecMaqtTjE+GWds2jC4sop4ce/9rHfmYwoxaaBmN2BEBNlh+51MWFZkGQbzHIhvTnJAoHZSAXK1CnxXka218oxrFqCJ8AnJxQWZ5Aem3EELh8RbJTAQI1mbQNJvckSoCw4JUh01gSCNXq5BZgZ4FYxYCYzaRp2awIz6UFBhTGUTeRHl1rJAXmSlihr1YfkkxKOE737qav5aLyxXjS2/fdrWHcF3z6xuaztvm0Y/cxpb/de8VGM3S4Ld2Lqssh33Gs+rj82/bwtu/4Hgff+f5nZdkXEvC4FC2jZ1KoXJ5J7FyeIT5qdQaTrGohk8/BDjC3+DUyMi/aydNf7NQEeJMtWCttharNDMX/5zT1s5kCHxnoc629VQvoad6K/UsLJyCfo2l7v2/tRMAc2CQ5k85alkKqKvbivdHeyv9lOcPq46dcIpr7T+KOW+WsRJe9TRot56mqVT0tevNjy+4Ll/pM2h4CJQSUY27H8WGyxIW5uLich7Kxf/yzt1D5fMLZ/eHnXo8AueeESrdLkLM1feZPw8na2sRXg8IQbQ5gHmswOzGDtL37iP2SovIIwNkVzWgWSbpRg+RExlm2wJEjiax9z9BHCcMNXh6CNnSSK6zBu/JKUilHY+ylBXFv3KdDLFxLVbIg3GoD+pqMGsNxK96EFtuQOt9grHXrGHlx4+Qfv5KvA8VwFbITAEz4kNmi9gHjmBURRGhEObAIHpLM2LGRAzbkM9jSImcTWO4yfMuLi6XgN9+XjuGPH9BupqQl3jQw2T6+r/3vOLGRlY1XHyC9y2r6ivLfs/Z4kbPhiVhcAgp0VtbsUfHEWvXoIRAeSVKamh5E7WvB7l6Bf2vrEGtTxHy56n5sOTo26L4RwVH79qCltQJn9CYbVOI5iweb5FCb4TaxxXhE2nUiiam1gawDEHTd44zfnsntg5GRpFq0/BNKIpBQdUJE5m3ydTqmAGB5RFkXjBLW02CxB5IfGonwdOCQgTMoKIYVkSPamQ+uZPQKSiGBM33TDGxJYZ/yiJ8cIzEtgZCp3NoeRPbkBinxrET0whdB0MHW6EKBYSUTmhCQx3LvzHIg0OdrK8bYigd5WhvI/FHdZSATIPADCliaycYH6hypi69FvX3GfgmLTzJInoiAz3n/+5dXFyeJfMK/YmS2pyQGmgawmM4f9emiRYJY6dmEX4fKpN1RB3qa7EjAWyPRJ9IYZ7oRwsGsdZ3oaTGZLePugfHybZX4d/XhzAMjHv3IdeuJPL4CObAIJ7xCVQ+T/nREnp4YRXvcuE/s+8ket9J7M1rUTVh9LEk9qgjD+4U6JtBRiLMLgtiJC1orCOxIU7kaw9j37QB7cH9WEDNv+/FMk2Cj/RXwlPL1crLQVPW9AxMzzjnLRk0wvCgRcNYE65ijIvL1eCLb9/GWz//yNUexiXnPTcvv+C29//RC1n/8YsLLboW+Kc3bLyo429bXVcRN+j5+EuRmrgUwwKWiMGBEKhMDq0qiipaYEhy1V6MlImcTJG9dRMjN3gJDCvymy0yeYOBjynocxRT/H0elISZNSaBUzqZmI416aXhcYU3YcIjh9A2ryVysohWtLFrYgTGTBIrDfIxQdMDaaa7AzTdN8XYrhhaUVL7yBTpDseXMtsfZGh/CKvepnYvTLwsh5WTxPYaZBs0lATvpCC1TNH8iyLCtIkdyWCMTKN8HsJff5j0q7cTGAZ9Ku0oU3m9iGAAOzHtvKCUX1h0HaVLfvqTLXzo1d/mP97zCl79/37K3xxowZewKQY0ilEbfVZjYjxCwwOS0dsLhA74iPbOIidnYXRiUbldFxeXS0glN8FGWYCyUaYTTqWyljOBUChgjo6BUohCEeExUIUiamIKzbbRTMcjYj9/I/zyccRDB9AMDzFtLXbEj56xEMEAKuBDRsNYPU9hvXAT+sAwMlaFikXItDuhp97JnKNSB6iTg079n7UrsT26M4EDFKq8yGQp3MvwIJvqMU+eRimFZ9rEMzSDdfQ4oXgQ7cZVaNM5uHEVHD+NMHSKN3Qg9h5Bdi93JktOD4OUiEgIOxxEDI8hImGKDVUYA5PY0ZBz3fkCUmgIXTq5bC4uLleET71yHTd3114yoyPi0zn4sZcyOJ3leZ+6/xKM8NnjuQDvRpmo/9mFFl1rCHFxBsLfvmY9VQHnGRH0XloTYUkYHMo0KwX5pFLYo2N49zv7TMDoO0lDOfzurrnjZPdyrKPHL+wc+3oo/3ezAe9haJinNh7b7WyvOeysW0Bwog5rdIzOHyzsK/qVpz9XORRrfmpm8Ft7KvsqlAr0CV3HLidfWhYkk7R/5Dhf+0gTksf47ppauni4cljVGeeLfHXhuV1cXK4gpXCqymo+73ye2axYqNTisIoFmKdEp5W8AeV22gOPo3C8B7S2YB05Vtkvf/4YChxBi5FRAoOOqp1i4d+/FnYMlDLC8OApFiptZE0c86QTtmmnUsifPwY1cQCMY0MVL8Z8tAf3O+c58747v9jo5BSi76Rz/zt9VhcuLi5XiPZ4gNdvawNgVcOZweYXR3OVn+0d1ezpu3gFpGeLK0dxbXHh5uFlROg6WjDoJE5nc+gN9egtzegdy9AbS5HNQmC9aNOC45Tfg3reBtKv3g7A1NudHIvU63cAUHzJnCqXjFejbVhTWdfCYWRVFLF1HfbzN6KFwwva660tTozy2pVo4TCj799V2TfzJqd/bcMaCrdvZfiDc/uKt22u9D8f8xZnu1zZ5VyO1+tcczCI8PtBCISuI6SsKMzItSuRa1cu7FuTyPo6TnxqJ4Xbt6K3NDPwZ7vmjolcfOyei4vL0sE8PfC0+88loW2nUgvWzyw8uFiIU3nbYsaGi4vLtUXINzen/Ey8AU/H+29dUVl+yVVWgIoFnpnX4kUray/TSJYGn/zNGy66j4Dn8vkhloTBgbLRwiEnJtpjoGIRrNoq7LAfFYsgV3QidIPp5V6Ofm4LiR+u4OhntmF7dSZu9JOt1hj6o11k6gXTb9mJsGHgQ7vIV+kUbt+KXNHp9H20n/wdWwEoblmBuaadkR1hPH1jJO9Yi3dklsTbdsKOGym2xsltXIbt0bFTKbJ1iqN3OQaJsJ3K4qfuqMI7niEfU2ReuZ2pt+8k1eZBW7/aKeq3az3WCzeh+XyYQYl24yqwbWR9HZrfhzB0hM8LloUMh0HTnFnJmSS9n95BujNK4F8mMX7eiFzZxak7DKyb1zO7sx3vtMCYKWA2VVPVa5Pf2InYug4RCTvhWl7v1fxFXVxcXFxcXK4igrnwmljQc0n6fOc8xaJXbTq77sWV5JmGD33mzZv56CvWnL/hNUpzlX/R7Sf+8mXnPXZFXYif/dEL8eiXzyxYGiFVll2pd0E6DaVZtjPdZfHP7Sb+OWc5VtpWe46QxNC85flhBt4fO/Iw8mdOHY/6h5zQp9A3BrGB2EGnnQCMeWNo/8icElbka054U8uPnf3tpZoygdL+cuKmGB1DltZ9339kQULnOSmpTHX9gXOO1Pfndi3/w14A/MD8P/NQ6TtQLAzjcnFxWWKcUZdC8/kW1Mm4kOOFx1MJ21qszoWMRMDvW+ilmF9w9AwWq5/xbIqPVupyeL1z4yv3F69GtdSjeo5dVB0QFxeXC2d146UNo/qXN29esF6O9b9W8BmSt9/UwSd+8Nyqh6JdQOL3PR+8+bKPY0kYHELT0Lw+lGlWHkaytraS13Em02/ZSdWXdjP4p7tovj8Fjxy6oPNUiu+dgVzRiXXsBOlXb6/kWlT2rV25IA76nH13tldUYRajrPYCzD2QNYnmMVBKIYTALhTRfN6nlbY1b92Mft8+Mr+5ncB396B2rkfsPnDe8bm4uFx+hK47qlUeo6JWJXw+0DSU3wuGTrEmgNIEWsEmXetBz9gUg5JEtyTdYaInJb4JgdIg02jjTWjoaZA5yL8ghdUbwjYU0WOCmRfk8B72o2egGAHvFEROmkyu1Yn2dWJ5INWmET5pozTBzArwjQtkQSHzjqpergaECWxIks8aBA76me0w0cJF5IAPPSNQEmypELbAk3TqeZhBm/BxiXdaIQsKyyMw0jayoPCN5bF9Eq1gI1M5KFpok0mmXrPFmc35iluHw8XlcvOJOxeG2MQCBolM8Vn3d/sN10cRve///k382j8/eLWHcUW5sSXKwYGZqzqGJWFwKNs+a5bvXMYGQNWXHG9D86ceekbnOZc6inXsBMBZxgZwQcYG8LTGBlAxNmAuqRTbws6VdPtL+85XR0O/z3GnBL7rjNU1Nlxclg6V2Xtln3Mmf77DuuwAN5jzkD4t/7RwteZfF2/WPE/oIjpv+5mCE5eTsk7efM9u5GuDizV1cXG5DPiMS6dWWXWOfIn1rVUcOD19yc5zJVjXEj1/o2sQXTt3ONQ/vG4Dt/zdA1dwNGezJAwOoWnIqhgqn8fOZJBruhHZPNnlNRj37qskjo/d3sHki/J4fEVubBpi4hMd+A6c4vRbu6jdX6DvlRpN9wmm1ki8UxActclHBHX/dRS7tQEOH8O86QYsr8T0a3iSJtlag+gPe0i9ZA3eaZNc3CDyxDSTm2NUf3Ufw7+/BVuHwIgidWcK+UgEfRZmtuVp+LFB9PsHKW5bhcwU0UenmdnaRPA7jyJXdJDursY/nAFLIcenmd7ZQvTgJIxNorJZR+4ym3OMDE0ioxFUoYBWX0tiWwOJVRoff+NX+Mh33ghAYFAgbJzZyRxUfXk3E+/eSaodur7shKGJqRlHEte2Yehq/aIuLs9RhEDohlNxXEpUPo8WDDrGh1UKaRIaQmoIn9fxfgiBCgVAKVTQhzaWQKUzWMkkensb5Aug1FzY6TWM3lCPXV8N+8/f1sXF5dlz11u3nLXtYiRT37S9bdHtd65vuuYMDoBjn7yDFR/+8dUexiVl1/L4OffFnib8rf9TL78cwzmLJWFwKGVjZzKofB69YxnMzIIu8T3eB5GI8yDOF6i79zQ3vnua+/evIfW/apnZYeDv0Wn94SSDt8UJH4XQNx8iEg6TvmU1wfuPIIRAxGPIqSRmsYAtBZ6f7MUjBIm37CD2xd2YN28k+G3HYxACCrdupmbvJPaNK2j+0Rjju2rxJSzUD8JUH5pB7euh+qnNTHcJWLEMz6F+MtuWw94BIkE/oq0Z8gWCT06gvB7QNcyBQaL3zDpFtmIx7FwOYdnOS4fhQUgNezaN8HlRM0mGb6uh+52PctfHOujA8ehYL9xEts6DMWsROJl0inH9626nUvGabkTCiblWpgnZZxAX7uLicmlQCmUWUZaF5vMiSmp1QghEVdTJc1AK1VjjfAqBkgItlUMNjiCaG5i+aRm2LtAsyNRpNN49SuqGGsJPVZHujOKdKpCt95Ku15B5mFkBrfcUyNYaVN39FNTEUEEftkdiBQ2KQZ1cTBIaLGB7NPJVkujhaYo1AWxdgCYQlqIYkiRW6NTuz2P5JIFTSZRHZ3xzmNCQRbZaUnUsw+j2IP5xm/iDQwz+Wgt1j2WwPRrGZAYz5qcQMQgenwYhKMb8GJNpMu1VGKkiWZ/Ef2T4Kv9ILi7XPxvbYudv9Az4g9u6L2l/V5sLqVB+rfF0uRqXSjTgYlgSBgdqLszI7Dt59v5y8mIiwcAO6MZJwK45WEqSHoSGeVW17VQK//fmJWnPS3407i1leCtF7IvOi7z2wOMLTqfft6+SaG4B1U85ydo+5kKf9Pv2UXPfXLiA98eOFrX1xNFzXqZVqsBrlTTr5+vyq1JYZXm9+51na1vLnz9WSYY/M/3z6c7r4uJymdHKlcZLnz6vU3VcaKBsRLUjx6i8HpTfg+2RyLTzt57sjmF5whRuqaMYFqRbLaoPOfkWKEXvb9dhG4qiP0a6WSMwLCmGHLU8pUHLz4rInEn0B0+SuHMdVT0zaIkUxY5aLI+GUIrgaBGZt5BZE62oU6gPIrMmsy1+CmHnISVzUAxDMaIzukVDbq6m5qCFMGG6Uyc4ajN5Q4DAqHPXM/tPEX+iFi1vok/kMGtC6IksxuGT0FCLyOYxTAuRzhL41Sjmuk48kzlmN7XA0yv9urhcl7xqUwvffuzy/+fXNUH1Ii+YF1MSTj/HC/q2juqL6NXlucTSMDhcXFxcrmWU7Xg3SquiWATpxU6lUKaJFBpIDRIzoBTS67wM2K0NVP2ij2JHA1rRQg5PkbipDV+iSFWvjZHIYXskidUh0k2CfJWiYU+OwZv81O8rYvo1lCYoVHnwrluOJ2WjTSYpdNTh6TlNbuMy/L86iqiOoTSBCnjRcibp9hCRvkn8YYPgkMIzkSbVHSX2kxxCKbr+cQSrowE5k8XbEMaYzFCsDuAfF5h+iXcy79Q1UmAFDRKrw8QOJxF5k8K6doypDGZ9FH1yFqu+Ci3gQx+dQaSzBDP5c3+PLi7XMVvaY1fE4LjUrKw/t9rVDc3Xbj5E7yfvoOs6C6tayly3BkfuFdvw/eBszVwZr8aavPDKmLImvmiBrEXbrl6xoCLwlSB75zb83zuHNrCLi8vlpxwXLYTjzZDSEcHI5UAItGAQNOHcd5RyQipnUmh+H2r/E9BQj9h9ANncBLZN9Ec9aNVVpNc04MkWEMLreGPfupOWu6cxq3xET9jkqiX5iEDYGrOt4B8z0HMKsa4JJQXpW5c7BsFNK9FMhdJAK9gUIzq+8QLK0An0DDvjlhqBYR+aaaOkwOxsJNPkIzKVphjSmVoZIzhqk63RsCWkGwJ4k4pclSiFfynGt0bwT9oUQhos8+KfcvywuXov3oQHLWsiAavkMXZxea5xMR6GZ3Sec5zo2aZwdF/iKuVLhXN5bZ5LhLxXzgy4bg2OxYwN4BkZG7B4Nd5ztr3CxgbgGhsuLlebM+pgLFCnUsqpcVGucyEEKptFFQtYpfDJsnqeOeioPMhIBJXLEzw4WNkmNq8l9sXd2IAeCBALBhco+dUsMixfeeGMWh1lrRmLhfLj+vRMJexTAMFSG9/R45W+zlTSCnE2wTPW55eiWrwSiIvLc4MrFX4kLrFp0x6/IA09lyXO7zy/g3/7Zd+CbVfKCIalYnCcccXa+tXYB44AZ3sk1PM2IH61n8l37AQgftduii/ZgnH3o3N1NHbciHboOGJZMyKZdvTvJxJYiQSyKorw+ysPeb2xAVUoYE1OoYXDZJ+/Cu+P9i4oyKV3LKvklpxZ1ErW1kJtrJJDkXnldgLfWSivW66VIWtrsZNJVD6PMDyVfA3noiXC0J2+hSD5hu3EfnQEa3qG4m2bK7kn5YJcZ9YUKdcSKavkLOjbxcXl8iIEQspK4TshJVq8GpVKQW280qZYH6Hol1geDT1rkW4wmO7WUFJhBhR20EIUNMKtSaqDGQbGN9BeP0lQn6DWF+TFscP05ev49qlmbmsa5YHRLjI/aCAfc2pwzLYpjJSgELPBFvhHBZYPLJ9CTwtkHrJ1CjNiIWzB8tVDjKVWk077MDwm8XCascfruf3Fj9KfjnPoWAtdHaNMpgMEPEWSOS+zM34Mn4nv4RBaAXJxiPQpsnWC6ieKKCnwj2axPRKUohg28E7myNX6yVVLikEB/+rW4XB57tFZu5iJvvT5wK0rrvYQLhv/702b+L2vPHa1h3HR7Oh8dsbs67e1XuKRnJulYXCwsOKufeBIZeatbGzINd1YR44hfuXoKcbv2u0ovgD+IyOo+rq5Ohq2wtzSjfbA48h4NSRT2JkMms+HNT2DKCk4yfo6zOERZI3zQmCnUnh/tLdiwJQr8M5PZNe8XqySwaG3t2H2n4LSDKEWDKLE2QUG9XFHUao8k6gFAmfV25CRENbMXHJ77ECiMtuY6PZSd2/p0srJ9WfWFBkveWKUco0NF5crjVKOZ0OTjvytlKhUClU0EaMTICUiFsVzbAikxBwcQni9xJsbqToaRk4kMeujyEQGu/803NBFsquR5pSF56BNZlkjezcH6Du0kqlVXuKHsjwU3U54LEd2i6LxVwUsr0bNQRPPYIJMdy0yZ5GPGYQPj2PFguTq/AR395Lb2EGhyrn1G58JUNURpmlgFmVI0q21LO9NcPRb3cy2h2g0BOJUNTGfjm0ImodnUTJPtjmIsIr4B1Nk2iIoHaqO5cBWyJksWBZ2bQR9PIlxqkB+RT2+iRyB4xnSK88t3eji8lyhKepjaObaUJO8nkOP7rhOihleiOrWYrLIb9nZfhlGszhLw+BQIDweyOUqORPW+DiyKlp56bb9xoKwgGP/tJ0V73MMDDWbrig/AfDIIfRgEBsQhoFZMlq0eDXkchUPhUrNIgwP1sQk9s0bMUZSWE/1Ytz9qHPOdBq9uQlzcAi5ohORzYPU6P3YGnzjGsu+NULibTuJfWF3pX3w23soB1SUPSNWbx9aMIh1YxfGqfFKmAQ4lYmVaTrGhlIIw4Pm91UKDh770ia6PpNm5A920fJfgwuMH7mmm4Hba2j56SRWz1PoDfWoTBYrmXTixmcv3U/k4uJyAdiWkz9umnBmDc/59ygcZT7zRD+cKKnt9c0LOdrXQ7gkqGcCYmCQ+l8567U/cz69OKp5tXud9fLN3AQ8pUKkAeb69JWWjXunFoRV+Xrm1PYCe+eWg/vmxlruu7zP9/jcuu/gwsssn0/0zi3LgUEUcyFaLi7PdS5lUb6zOEeczFt3tvN397iKlvMRQtBc5WdwOnu1h3JRvPP5nedts7Ypcta2iyjN8oxZMmarsizQJNbEJGLrOgBEKISMOVrScsypXzH+u04o1Yr37WHoj3dReOkWxl69ipEP7AJA72wHnJd/ALsmht7aAoA1Okbxts2Yt2yueBlUsUD2N7Zh9JwiP6/6pDA8oEnsspTtsROceMcyVDpD90cO0vK/H8I6doLYF3aTftV2ZFcH2vrVgOPpcDpxfkmh64jGOrTHnnSMl6rSeTSJsktGVMmYUsUCdjbH0B871xPZ60PsPkDDpx9i9ob6yvhSr9uB9cRRGv/+IdTxk9jP3+jEhyvlhHk9TcVJFxcXFxcXl+uTc71Dvu8yhUZ96I5Vl6XfK8V9f3jz1R7CRXNzd+1529y5ofkKjOTcLAkPhyi/HCsbuXoFPHUSqqJQqsarBQLkuuuxdUHL6/tI9W1l4EU6nX/yEADTf7uDFV+e5uSHdyFsyHTHafyJgWfWwjeSwTw9gOzqILmhjtB/7YP1KxHBIDO/eSOxniTpOol3VSvJZR6CL9+Kf2AWs8qHEoJslc5Mu05yXYHgMUi8ZAVj2yDWI7B10EyI/9tuLGDs93bRWOxGJNOYW7qx0kVkqhZVHyfdESEQ9CEsBSPjCK9zXSqfx87lEVLOVR+ujtF8f5IT/3snZsii6u52tL+OY8ya6M1NTN3cRi6m4XvJFrI1OiMvsGn4hUZsMoKm69iTU06lcRcXl8vPvCkioRtO0T+P4Xg5pHQmHEJBhN+HPT4JloXweaGpHkwLpIYd9KJli1hhL7auOTkhtgJbUYx6KIQd+duqx8aw4iGGd4Vo/c4A6dX15GOSwGgRX88Ao6/oJNJfwDeUwqzyk497Ce0fAttGhQKImRQE/MxsrCcb14gdzeMZm8WMBdDyJkrXsHUNYyxVGZccGKfY0cDUmgDBUQtPqojx5CDZDW0Uw5JcVKP6ySwyXaAY86HlLWS6gO03kKk8tkdHzqSZXVuHnrbwHjoFY1fx93JxWQpcyWzdi+Btu9rP20Y+TcG5y8W7bz7/jP6Fclm9TS4VloTBoWyn0jhKofpOOw9p6QEpgSIIgXd/H9bkFE89fydys6DjvzNYL9yEsBQrPt7D7G1raL0nhe3XyT3lIfDdPeRevhU5NYsdDmP19hEeHkN0LsPa1wNruonf10exs4Gaz+5m9H27aPpGL2ZXE9rEDLpSWEEv4QOT5CNNNP5LntHtHqLffAylbaIYhNrPON4NtXM9xsAkjd/uZeKO5cS+cBStqRr2PwnRCPbhJ/EF1qHNpDH7T6GFw6iiiZVIOJ4USkUALQm5HFqhiNlWjZ4WxA8KvO/rJ/drdRjffwRTk3hSLUS++jAAox/axZq/GkZJDau3DxmvRgSDUCzAte0hdHG5NpgX6lnOn7JzpWAi03RCOMsqVWVyOSh5T8uUw4/O9E16Sv8qbY5B48NO6JS3/xTe0j4TiH9utNJO4IRRmZxN8ER/RU2q3LZ8FRoL1aRMQIyMEt+9cKyen47hYU6VygbkvOX512QCvlKYl6tU5fJc5v23dPHoyQSG1Dgxnj7/AS6Lol3JWKDrmMXyOi4XS8LgANBCIexUqpI4LgpFVCLh5DjYCtJpBv5sF+0feYjEW3eSbPeRatNo3J1DFQpMd0mKwSCmV1B9JMPgn+xE5kHp9YSOGOiZKLkV9ej370Nvb2NqQzXe6SjeHzkB0PX/9JDz4B0dg45l5BtCZOM61f2jpBsFSoRo/I8ezG1r8KRs8lGdwu1bidxzhLHXriUSNhi/0UPsmPN4twIGctNqrEcOOV6bRBpreNTJs7BstJIHwk6nHU+HZVUUquxMhtkWD7UHTAZv1oh8DbxTBVKv30H46w/jncwz86YdTGwQLP8fDzHwgV3EjhUJzmbAMFBeA6aTi37PLi4ulwEhKlXFnXVnWfP7HY+Ht2QWGDrC50MF/Yh8AWyFHQ1RrAng7R1FBXwI08KsizDdHcQ/bmL5NIoBQfWeUWbX1hLeP4wydNTwGMLnRTXXAWAffHKhBK4mkV3tMDIOrY3OZM6KZShdQ8uZmBEfwrLRh6YwBwbRO5aRXV7j3Ht+MYzSBCJfRPm92NEAcoOk5qsAACAASURBVCLp5L2FQyAEhdYYnqeGMEccI6esoDcfGYlUBDJkWwvpVbUEjifgycv+i7g8A4QQ1cB/Au1AP/BapVTijDYbgM8AERy78ZNKqf8s7fsCcDNQtqLfppTafyXGfq3xwZesBGAmW2T9x+++LOfoqrs21bBcrjz+K+jdWRIGh5AS4XHSGGVVFHs2jRYKYiUKzoPbLoAmaflLJ4Qq9kVnqk17/Q6M6Rz9f7KZtk88VOnPvHUzNQeKCFvh33MMa3oGWV+H92A/+Vs2w/37iPSfwrx1c0WJStbXIQJ+50E+NoHngRGMfB4a6ml4JIfx6DGK65eTafSi5xSxYwWMux9l4IO7CI7YaEWbpr99CPPWzQAYDz+BiEYcdapSfQ69pdnR4C8UEVI6Lx/5vHPtORstFqskv2fjGmNbYcUHHmbqB93UvXOY8K9GSb5hB1VPzCALPrSiYOz3dtHyn8c59ZblBPbaWCPXXhVTF5frBqXQAiXN+pJUrtB17GwOLehHBAKOTPfIOFY2h71lNUIphGmjZmchnWH6tm58UyaxQ0lGd0WJ9+QIDhRRQR/+4SzjL2ohNFAgvbOBqi/vRsaiWL19joFRXYW5shWZLjCzMkLswVMQ8MPkNPmdqxG2wts7hooEUbqGMTqDNT6B2LyWbNzP+I0ewgM2FIqo6jCZFTX4T6cQBRM1mcBc24EZ1LENjcDJGayWWhgZRWxei5hMoYdDTpjoTAoAK5l08vBqqynUhRGWItsRcw2OpcefAvcppT4lhPjT0vqfnNEmA7xFKXVMCNEE7BNC/FQpNV3a/z+UUq7e8QUS9Rvnb/Qs+fI7tl+2vhfj+StqgSNX9Jyuf2OOZyuJ+733Po/qoOf8DS8R5zU4hBA+4Bc4oig68C2l1P8UQnQAXwfiwD7gt5RSBSGEF/gSsBmYBF6nlOp/unMoy6rI39qzaUe1KZFwkqDLEq+244ifX78i/PWHsYG2M+ZR9Pv2VS6s7L63Rp2gYf3+yQXt7DP2n4k5MoocGcUGtAf3n1XoqvHvH1qwrt/nSLtUKg3P72tgcNFzlJW45rdv+MeHKIu1Vb/iaCUsIvI155pDByD0zVK/QNNfj2KXZ1FdXJ4DXIl70wUMovSpVdbnS15rPh92oQi2hZW0YHqmMskBIHYfcNrh3Kv0znbCJ2bhcC+a10vd4z2AE+4kl7VinzxNrFTrM2Z4HOWn3lIhJ9tyRDcmJrGB8P65cCpZW4uRLKD2Hqps0yiFS3m9qH09+Fuaae33YR097rQZFHgPKuZng4ndBxYoXJWvX+3rWTR0C3Du5YkE2tG50DCXJcedwAtLy18Efs4ZBodS6ui85SEhxBhQC0zjsqS4ki+RACuv00rk1wobWmPP6rj1rVWXeCRPz4VIGeWBW5RS64ENwO1CiB3AXwGfVkp1AQngHaX27wASpe2fLrW7YM6s0nvWfrfGxDmZX5DQxeU5wBW9Ny2KUs4/23L+zb9nadKZeChNlpQ/7fn5HFrJnV16cTdP9KP29Tgy4XKhq1tlcugN9U6R0I1rF9wLy14V7cZVCF1Hb29beGw6jdp7qLIu486MmKyvc86FMyFizZesFQsfD2XFwPl9aCFnCkZWRStqgPPRm5sQhgdZE0eu7EK4kyJLlXql1HBpeQSof7rGQohtOPbjfI3jTwohDgohPl0y7s917LuEEI8KIR4dL9WmcnF5prg5HHO8fuuVK953MZzXw6GUUsxVdDBK/xRwC/DG0vYvAh/Die+8s7QM8C3gn4UQotTPoghNQ/MHEOEQqr4aMTQOQkM11yKKFlbYi7AUp14axvIpgoPQ8MAk1hNHmXnzDmLfOUh+12p8/VP0vbmRto85Xge5egWJDXFi+5ybWmZ5NbJgO56NmzZgeyS2IZhY56FxdwbLJxG2QpgKY2SGTHec4MFh7MkpJl6/nurP7ybzm9tJtUqav3GcE+9ejn9MUbtvlpFdYXI1Cv+YoOGXM2iZPPnmKJ6pLCJbcIqBTUxhd7UijjgzksLQUYViJe5aeAznIW9ZHP+TtRgpQb5aoQQICzr/dDfZO7ehpGDgZRZYgup9OslO6PqPBMqQaPkimBYiOQvD5/rGXVyufa7EvemC0SQoG6EboAnnJd62K4pUwu93jBGfF5XOIoJ+rJoIlt9AmDbpVr9zTQJ8k0VycQPTL0i1CQKjinQzFGI2ekrDk1xO9IRN+iZHOjsfh6Zf5NFnC1iGRO9cxviuOvwTDWhFG+9EFooWwpAoQzLTFcDWBaGhIpmQxDNjkqkzsHXBdDdolqBQZdN6j8XIdh3bgMhxiB9OM7l2FelWgWcawgMWuSqNyCmnn/CTCbQbV6GkRMvkUV4dlS0gmxuwaqNg2thbVpNp8MK33MibK40Q4l5gsSpnH56/opRSQohzP6+FaAS+DLxVqXLSEh/CMVQ8wGdxvCOfWOx4pdRnS23YsmXLxf/tXcO8aXsbX9lz6moP42n5tfVNV3sILk/DX/7mOtprgudvuAS4oBwOIYTECU3oAv4vzqzGtFKq7I4YAMoCv83AaQCllCmEmMEJbZg49ygkoqMVCkWK1QHSN6yg6BdYPoEvYaMVFYlVktCOcdqjU3yg6R5u+HCejG3x7r4G9HcFOTZgsazO4mWxh9l/Uxu2EgxOBLAHBHq+hmxMo1AlyNUq7JftQGlAbR7DY1LI2vRtArAwZzyEe3UyjV5kTmC+uIXfvqUXQ9zL+j87xXen/BjCZvV7hujPjZG3dVrfO8UvJ1ewIjTGr0Y7mR2MUwxESC4XhE55yDQIgoMKRAP+CQuf1olM5cC00EwLCkXwGJBzPBRWSy0fftU3WesdYrPXw4SV5v87/XL2fXMduaSNL5Jne9Mgv9d4P1O3hPiPkR10vGSS+/91B94ZhSwo/KNh1+Bwue657PemC6XkvSh7HRZ4GxdRpGIc6J9zMYd2L9xdDlsq+xRqFjnlYmmhAifUKTbPU6GYU6ACiD589nnK5aDOdLAv++HC9fjDzhdWJohTZ8gwzUr4qt7YgCoUHHnuMv1z47s2Ho3XH0qp2861TwgxKoRoVEoNlwyKRWOMhRAR4IfAh5VSlf9J87wjeSHEvwN/dAmH7nIV2bzs2YXrXG5cB8e1xwUZHEopC9gghKgCvgtcdJUXIcS7gHcB+Aig+k5jZzIYI2FijwmsZBK9sQFzeASAqmWtmH9xmhPv3MknH4hiHTvB2Ht30fTt4xSnU3SpI6h8np6Na9FLcc8dzFXlDgiBtm4l9sEnSb9qO8HTGXjkEKnX7cBI2/8/e28eJddZ33l/nufeW/teXb23ulv7bsmStbEEjMEQkhgDJoSEkAy8JJO8M5OclzdhMszkPSfkxDOTyTKTTIYkDCGGsCVsBgwOwTDYkqzFWizJ2rsl9b5Vd9ded3neP25V9SLJ8qKlhe/nnD613O2p6u577+/5/X7fL6UmDSUhNG4TeHzvgjrrvSvXkb2vhR98YT/sWEn/z0U494kKoDj/Z9tY+Vv7gRFObVlPamAUWjREvkjisSsgNVKOjYxGEW3N7n/J2CT29LTbUBoM4uRyrhFhqYTw+RAzs3x+bSfQSd8f7abn8SKyZLHsyPMNJ/Ys8ImHPsr4Vh1ZhZM+6D48izaZA9NCzeZe7a/Iw2PJczvOTUsFrSYqITeuxTlR67qWGjh241ypzROeEH4/QtMW9JQsULF6OcdOp3BmZheUvOq93Vh9l1wlQctChkKuJLffhzMxhQwEEOFQoz/PY0nzTeBDwKO1x28sXkEI4cP9H/v7xc3h84IVAbwLOHHrh3z34900e7yWeFkqVUqpaSHEU8BuICGE0GsziZ1AvSN6EOgCBoQQOhDHbdBcvK9GWjWuNSmh62ixGCIWxR53JxyVZaO1NLvlSFUT+833Mr1W0fLdCiO/5apD4feRfc8WYv1ljMEpqhEflYd2oJUdjJyJrNqIF/qQqSTW8dNMf3A36e+eR+ga04/sJJC1CBy8QOmhtTQ/NYSViSE3r4ULVxD3bSLXG8YKCBwNSu/aQa5Dp/s7biAy8Ht7WP0fjmH91FYm1wdo+84Akw+uIv30IFg22uoVOPEQcroAs3m3CX5o1A0wolGEJnFKZbREHGVaroSmaaEci/LP7ODKOwQrvlBC7D2GXL8akkmKO5YTHGphdnWM+MkspXQKvayI/cN+FLUm0dpNiIfHa4VbdW6KidSSKfmoBxKNYAPAsRGGryFNOz+4UJUKVw3+FVaPXStosPrcCR2nXG5M0EjAHneNTZVlLcxyeCxlHgW+LIT4MHAJeB+AEGI78OtKqY/U3nsjkBZC/Eptu7r87eeFEBncJNZR4Ndv8/jvSla3/OQ0W//b+1fy339w/k4P4zXHhvbYjVe6Bu/c3HaTR3JjXopKVQYwaxf0IPBW3GbLp4D34qrBzJ8Rqc+U7Kst/8FLqZEW6SSUK6hQgOpPbcLRBWZEIzBhUm4y8E9b9D0i+JUd/4eBNyQxyhbjpTB9741THlWM7fEjE2kCwSr28SCZoxYTm8MEJhXxyBqk6WCtacEOQP51veQ6dBCQ79SIxlYztVFhRjpwDGg+WGTiQ/fgn3WYWS4JbJuiIz7Dmf09WAkLVJjp/2sb+rii/++WUylI4sksfW+JY50TpI6GqDaHybf7iAxVmd7cQuxKiuCZUfD70NpWwEwOSmXXdEXTELXeDaHrOGu6ed0f7KfZN8tnO3fhfGwdoS/FmXlPE9WYIn4+Tu7NBUZ+LoRuFNCPRRj9HztZ8+kcolRFGTpicBS8a73HTzC369z0apHRKE4+72YxCwW0TAZVKCzMPAB6VyfWlQFkNIq9aTnac2ew712D2HvsRfc/v3n8TglH1LPB9c/kCVjcXSilJoG3XOP9Q8BHas8/B3zuOtvff0sH+BPKB3d1s6kjzsP/c++NV74DJEO3Trr31eIlh1xertLUm9dkeOrMOO/e2nHjlW8yLyXD0QZ8tlYrLYEvK6W+JYQ4BXxRCPFJ4Ajw6dr6nwYeE0Kcx73lff+NDqAcpzFbxjAYNfG9usxFvVZ59ZOwFx9QAApEuHYdc53gotcGc/XHi5etWNTD2Py0+xivvTaB5fOaIppf5LjO1g1oTz3X2Db9lPtoUbv5qNVXy2jUVbGpVBCa5hocAhx8nsNbJZCgmdNUH9yO73v7mR/HNn3Kfay84z78T7gnq/nylR4erwFu+bnpZTHP/E/6/YiAHxwFPgPNl0QEAsiMewYSAT9qXS920MDxu2pUU90+AtMdFJollaSgKbUZaToENq5ldl2cfIeGVlG0/DhLflUcKyiIXKmgz5QodUUJDuSZuDeBFXR738IjVWTZxugfhYAfsy2BfqIPWpoorUhjzFYpdAbJdUk6//4cxe09hE+NUF3WhO2XSFvhOzXAyLtXkLhgumIa2TJOUCe7JkRk2MI3XcHxa+6ygUmsS1dco1PTguwMqlxB9HQipnNg20y8dTn+nA1f85rGPTyEEGxdlqQnHaJ/snjjDW4zD2+9Wnnuutzu+rCbfLzHPryDD376wE3d51LkM7+6444d+6WoVB0Htl7j/YvAVSNXSpWBR17OIISUSH/AVXbxGYhwqGY8FYeRcUQggDU4RPHhnQz8jE3gsg9rTZGmxwMEsjZjWw30MrT9cIbBB+L4ZhWOLtCLiqYvHQMhkKkk1Z4MxtlBnKlpSg9uIXR5FlGxyK9LET08hIqGcPwGTkCnmvQx02vQ+qMp+h5JUclYBEZ0wsOKQrvAjCgSp0E4EBkyCZ0eZeDhLjq/0o/KFbF3bELrG4FEFKfvMmr7eoStsEI6+qGzqKqJKleQgYD7HYRDqJLrmq41t5Db2U1gvMLKPz/DE0cEoW17CEwoEheq6P9ymMHf3UOh12L1rx/gwn/djVEQxM85xC6WkJaDnCnCmZfzW/DwuLu4HeemlzkgUDXp23k+PHNeHAtTjlqxiJqcajSON9f6MMJ+PzIRd0tJbXcaIT4wTGhjL/Lpo4hYjGilChNZt6ejZxnh56axhkfI9F3t9k1rC1RNxN5jOIYPKSWh06M4kRCJA1liT2Shs5XwC6NQqWKcuoQ2m3czJ8kkmU8dQBg6WqYJa2AQ5033kvrMPvSeZah8AZHLI4TAKpcR2zbApRFUvuB+B4A4406wCL+fxGP75owRPTw8ljQfeUPvnR7CdbnZ4U17YvE09NLnc7fZ4PHVsiScxgGcSgWpachYCvvKEDIchGrI7Uew3Yt4+FtHWP21KtkP7SYbDqI0RejkMGLLMtp+PIt44QLmu+6l7ekik5tCpD6zrzHrL0IhzIiOHB2j+PBOQl8/gLp3vTsz+I2DTP7iTqJXKth+Da1oEX7uMqHzEabvzdC2zyT0wgjZXR0IR6GkpOlYAW0yz6VHWkmeNnGSUYptCmtwCLl5Lfr4LNb4OM76ToxsHLJFnP4r+NpboSmFPTSK0NxbDadcRgu6f+wyGkWVy2RX6bR/4wBPntsCQMcPC2j5SqPpvZxxSBx3f32dP7SY2GiQOjIFoxOIYNBrGvfwuF3UMhtCCoTu/k8qy3KfS+n6YiTiEPCDZaMqVezeVshXqOxYQWC8hCyZ2EGD0s5lzPTqJM5Z5Ds0bL8gdsnCMQSzyzTimZ3EDg1S7kriVwqyWZzJLKqmcGduXYGWryLzFRgeQ4SCYBjYrUm0oJ9KdwrfeAH7zHnMt22nGo0TmGqi0OrDDAlavn0RmtOIjmZmNiaI9pdwfBrSdFDPHMW6fxv+gWlEd5cbVPS0oXSJ1j+KmnRQh08yv3tMBgLIdAon64pkyC3rqTQF4Z/vwO/Jw2OJIpZo9/jLuQnf0hW/8Uo3kZv9ld364tqbzxL9s7kuSyLgUMpBGAZOqYyYmUXrase62I/W0YrQNZASdmzC9mn0vSuAHbNY+fcmQ28I4dvWTseje1HA8L/bQzVpI/Ydw7l3D1pTmsrmHgL9k1gX+wnttyAWI3Z0hNzP3Ufk1AS0hlC7NmMFBL7LU+Q2txA8NQ5B13E34TgUVzUxe2870S/tR1vZS+R8HxMf3U3mM6dRotU9HmD9m+2M/PYeOh4fxh4aQUunsDTpNnoP5hFrl+Ocu4QIh1FmFWXWDLukBprEKRbRfAb2zCztf+yWSQWPhCh02Yi9x8j+0i7izwvKCcnqz0zhBHTMt2zD/51DhJp2YcWD6FYSJQRUPINED4/bQi2zoZyFxqULTEwXZx3Gx7EB/ylXsrZ+kx48NFfuGVh0mLqcrAXog8NYuzZimF3YzQlk3xCkk2jH+7Gz2bmb/vpxrwy4Tubn+xqTMMaTh66SxbUAag3osePue/PtB/UfHEZ0tKOKJabftob4t55HSyYg4EcYuvvT0YrTP+BOqGga1uDQ3A6OnmLpVoV7eNwZ7rL7xmty/9oX9Yr08FgaAQdqrvHRnpyCurKJUg1ZXK1cRhaKrHi63Nis8+maNGPtdeuf7yX9wDYAmv9yL6RTKOm698KcyotUisgLOva5i/jPXURLxEnvm8ECgn2XsP3+RtOjfb4P//k+Kj99H2O/uYfmv9yL3LyWpr/ehwLSp+yGPOTaP83jnDyDXQuVHdwLtM6i/opisTEr6hSLriNxzlWxshfp9bf/17lmsvjnXNnz9Kf3NW4oJn5rD63/okh+dh/C78c2a9+Gp1Ll4XH7qPdv3K7/O8edhLAAMTgM0Sj2mesrxGgtzdijrrWCMHwLGs1fLvbEJKpSIfql/Tgsck4HOHcRwJ1QCXuuGx4eN2QJRhzr2l6Z+tHdy12Y4rjLWBoBx3WwT52de34decVGs3kN4/uHF2xjfP/q7ZxcDnJzJUeLb/KvpbDi/87BRqO4c3xOljL0tWcbAc8Cucrr7Gdu4Vy9N46NqryyG5XWP5sLSDxlGA+PO4RSgDOX434J+XlRm9jQEnG3JMowEO0tVwUO8z2BFrwfCIBh4ORyjcmUBctDoYZqVD3YQIhGsKEv72lMxlxrXOAqZ2HoqOxM4xj1ZYsDFy12df/I4nHPH5OHh4fLikyEi+NX/4/fSd667sWkce48YilGabeZlpj/xistIZZEwCGkRGvKQKUCHS0UexPMduv4pxWaqdBLDoVmndnl8GsPf4+IVuZ/9+1hdCBJa9cUo+eaiC6bJTcdAqlIJvNMDSaQJUl4QNL+o1kq6QCOIRndoRG5BMU2QSXt4MQsoqd8VGOQPOMwvVLiy4FWUlhhQalFIZYVCARM3tH9Al99cjdtW0aQQnFpoIlNKwaYLIUYGkohpMLf52fZEznKLUGsoKTYJElcNJnp0QmNO4SGyugzJRgYcd3FNc1tvDR0VNVE6DqiKUXxU4qgbvKfuh9nb3EV//P4G5GXg1gRh8CoxsoHLmI5ktlKgIqlMzkZofOrOv7JKlrFRhueqnkqe3h43BaUahjr1YMBVa26YhiOgwiFEOEgaibnKljpOlgWaBpk0hRXpQmMlym8ZyfR87PMrolTiQmqcYEZAUdXJE8rJjcL0scVxRZJ67NFim1+ihlJrN9C6QLhKBxD4Ju2cHwS2y+Z6dGJ91sYeYt8u4/QqIk2nGfio7vRyxAZqJLv9OFoIC2YXgNGXhC95DC1QdC2N0N2lUHigsX0Kh0lIN5nM9uj0bovTyXlR0lBcLCAMG2UoSFMG1GpIiwbHAcVCWGfPIP5wDb4Z0+lysOjzp/+/BY2/v737vQwXtPcjT0cK5vvLh+XJRFwKMfBHh8HQJTK+F84R2bROn4gBXzvP8aAGMmNOvETBwGI4aqgtM5bv2n+/gFf7Xn3t9zHNNfmxZKIR4Hl7Gu8Xk0/FVxp3vUd5QW1yvX667ps7/zPsziXIQyfOxNYdwGencX3ViCd4vcn3RKxFRxdsE3lD9zHcO0nNc9BuGEA6OHhcfuol1UJGs3ibsBhoMoVnOkZ1MQEwudD1bIFWiaD8Puwz10knC9CMEB8MocTCZB4dhAchT02DraNsiz0tlbinx/Bef0WUo+7pUvxC0FikRAiX0SVy4hEHOUzKPckcXRB+HyW8AUBtoOYyeG/6INKFWWaNP31GWZ/YRf+gWn8A+CEA2jZHMkvDKFsG5Qi/OZ7MSaLhPaNQjpBcDSCOvg8WjpF5Ikiat0KQmcnXD+hcgVVKKKmZ3BqPSx6zzKcRARRqsKuzfiy5et9gx4er0ki/iVxK7aQJd6R3JG8+1SlbiY+Xd54pSXGkvkrr6faZU8ndq0GWG5Zj5x0U/R2cxxxup/SG9cz8qtlOj7lQz8Beq2B0c5mUXvuYXZ5kPjn9qPXJCavhd7ZwfSeLiJf3k/+kZ0knr5EZW072lPPUXnHfYQuTCEqVaxLboqg8J6d2D5B/CuH3EbQeTf3Ex/dTfpEEdt0YHDoqpIBvbMDZZrYo2PIe9YhRiZRxRJOoYiWjOPk3TRqfUYU00QE/NjTMwx9YC2FLkXPN0vIp4+itTRjrnXNWuSPjqBet4XBN4aIXnEwioroC1PYL5y7Nb8gDw+PF2e+LG6x6PZqAfaiUsf5pY/2+Di4cy1Xna+u5atTX0c+ffTqxvA6tRLRup/RjYo1Y1/Yv2CdxZMV2lPPzY1lXulWo8z1yMkXPYbVf3nB67twItHDw+Ml8Mi2Tr5yeOC2HOtdW9pvy3E8bh5LJuBo1BrXgg0A5+ipuQvdlQEU4H/iIN1PzG03P6sg9h4jXmtpuF6wAWANDBL58iAAka+4PRhabX3/EwevuniG/+lZYN6Fcl7uremv9y1Ytrg+2RoYnPs8x15YsGx+X4qChm5//bHlfyx0H7VHx5D1WmxAPHOUzmfmLcfDw+OOU5uQEH7/3ERKKARSIAIBnOw0MpnAmc0hm1JQKpPb2Y0ZkvjyDkbewjc4g/IZVFrDTG7wE7tsU41IfAWH6PdfwFnRBSfOocwq2pqV2GfOo7e2YNUUpsDNntjj42gtza7RYHYGp1RG9nZhn+9HW7sCLJv8+jShbz2H2LAKMxXEf3oIq7uZSspPYLiINj6Nk46hfHojs2GuW4YxlsM+ewEtmUSVSq7vxos0z2tNaeyJSdcY8NTt/IV4eLx22LU8dVP2s7TzG6BrN3eG/66bCLnrBgx3XU4m+6HdnP3f29G7u9BWr1i4cMcmtFXLr95Iagte6m1zxVdi24arVtfWr4Ydm9B7lgFQfXA7AOYD23De4PqMabG54qv6/vTuLvexc5FlfC01KTevdbdtmWvGEoYPGQigJZPuOOelMQvv2dkYv9pzj7vvjvYF23t4eCwx6qWNlYqbea09OrN5rMEhnGLRfczlsPouYY2MEn1+jOQTLxB4/AC+o32owRFkoYTx/cO0/vd9hL72LMlTs8QOD2FuXQEStPaaDOWomyJZEGysXgGOjZZOYY+OYQ+NuqaiXe3YZy+AcmBgBDUwTOhKAZlMInMlfEOz7mTN/uP4v3MQdeQk1sAgM+viaBOzIAQiGnEzLGcvoCXioOsNkz8tHkOLRa4654KrbiUMn5eF9fC4Bn/9wW13eggLeCUVVeGlWBr2Ermbx363sDS+4fpfthDIYBCZTmFdGUBbvYJyT5LgqWGcdIzcSvcmf8VjDiMPdjK106TtyV2ERqv4Tl5hdGuEaiwKtKIXQasqMo8dQfR2ISomVnMMfdDNKhTes5PQcBlZNMn+8m7CIybGk4dQr9uC8+wJKm/dihnzoXWmyK72Ubh/N1bMZvlXwHr7fRRadJq+dQZ7coqRn+ul6VMjTL6hg1h7CnX6Mtqq5agrQyAl9j2r0CdyYLn5GhHwI8NhVKWCDAZQ1Sp2NoswfAjDVR3IPryZ9L/0Uf7+Mq4c7CDaD83F9ThKMb0hjpLLiQxWsYIa4RfGcOJhRNVCSYksllGFIlw/yePh4XELEYbPvakX0nUNtyyUZaG1ZFzVJ1/NjcK2ERUTqibmpuVU4waBsRKFrhChkQrj7+wgOO7gm3UzBqX1MRJn8syuihJMBNBbEpAtIJe1kV8RJ3pqEmFaBr4qQwAAIABJREFUKF1j8qfXIC1FcNItkiondUJjVcr3thA9PeO6j+sa02ui+NrC+LNVzv2ij9jZPeglRfO+LNVMmFLGIH56lvE3tpPsu4SdjGIt30bg1AAE/FjNcXRdw+xpoRrQ0PMmWqECk9OIUBBl6DixILJkIkoVlKHD6et9cx4er03etqGVFZkwF26xWlXAkJTNaxVsLuRn73n5JUv/74Nr+Lu9/a9gVHeejrvMaXxH783JZN1OlkTAITQNabg9DDIWxUlFEU0bULaNqsUicnIWsTzGxFsqrF55kYF/Xs+qvzUpt0gqKYOJ968idtki3ufgHyuihECc6XNdvAEnFsKMGJS2tRN5YYrwPz1L8eGd6CUfgSkbM6Ix+dt76PjMSURLM6ELU6BrDL6tCa2sWP67+5Bb1mNF/Vx+m4/QsMCenOLif95Nx48sCu/ZiRkSiH3HsAG1bhk67a4MpU9SWp4ieOgiem83anoWEQwgggH3oh+NIpRCRsI4+QJC0xh7W5X450cx7V4yW0cpDLcytTlONSbwZx3G3lbF+j9+ptdAc7SN6Jdcjw5tZa/nMu7hcbtZJIfbkIsVoiGIgRCuJPd1kAODDbGJ8AH3sfmZhevURRCjB+beqxcvhY4vLKtMLCpbqhvuhVnYHxKbt97qecdzcC8Q0drzZE23Qh05ic68Xo9+97kYHkEDRCiEPa+0VEajyBEBUmAtkiD38PCY4z+8cx3/6u8Ovap93EgudnNnggN917YZmM+KTOSG6yzGyxLcPv72Q9vv9BBeNkumpEpEo4hQ0J3xM220kUkqzWF8M1XskVFy2zsotEhwBMe+tp70zhEmN4XwT1TJdWqUWhWhbz9H/0MaU5vinPnXIfJv34TYvhEsGzk6BVIQGK1gnznP+K/vxp81MWaqDNwviZ6ZxvZB9qfXUdzYTnlZgnJ7lNa9OUqtgtz7d9H3expmVGfVY1MoHUb/zR58MwL/dw5SSktKGUH+kZ1oa1YiqzboGrmNTeCAXrCobOl1b0g0CbaDKpZcJZhKxVWuKVcQQiDaW5BjfiYeX83wZJyZH7ZSyiiqUYGRV1TikrZv+QhMOyRfgFJaMvjxPZR/dgdWJoaIRRHRl3+y8PDweIUoNSeLW/sRfr+b4QgEGn0dWlMaGXWlDLWmNEitUYq5AKm9pNJJLZmce950Pe29ayMM34LXem/3NdfRW93SrXpJaGOI0TlJRhkKoXe4M6KL+9hUtYo9O4s9PYOWvvtm5Tw8bhc3w637Yw+uuQkj8VjqBIyry1aXOksj4FAKoUm3sbF20VaOQ7nJwNEl5hvvIfL0BQpd0Nk2RTWuGD6XYWqXyflf8qEk6HnBuT/eTviSxswq8I3p2D7ByO4YZnscAn70nMnojjB6awvJs1WEUhQ7AkT7JRPbU2SOVjGDAiUFesnG8UtGd0UJjiqEAz6fRSWhMfjWNFYIzDfN4Ohw+T/tQauAHVLEzsy4M3kRH5Pbm9AqDrM9AbJrg+i5qhtolMqoUgnhM1yNfgCnNudoGIiKSWbjGOYPm2j+eoBKQqE0mN5solUVWkWRXSXJdWrYfsgvUxizoCRI00bpGlheC7mHxy1lXnBR778SutH4AdBiEUTAj97S7PZ6VU1EZ6vba9acRkvG3QmRUAi9Zxl6VycyGkXLpCEeRVvZi7xn3XWHMN/wz56YbDzX0ik3aInFkBvXLnD81hLxqxzA9a5OqJqNgKgeBCmz2ugNmW94CizI1tT7UrTMYkFz0FJJ9M4OtJW90NJ01XIPD4+bx7bu5I1XugErMuEbr+Th8TJZMvkvezLr9jNYFlIIzJXtJJ++ggr6EeUq1sQk3d8qUr1PsfyPjuMUCgx9bA/+rCIwbRM/Msr09lZCoyUuPuQnfk5QDQta9+eQl0dRgD5lELscwOxtxbf3JNb2tUQu5DHyIVQt9GreP4XSBM7x0/ilhvzoDjL/ax+Vd9xH5+9UmdwlkFXo+fMTjL93A5lvniH71lUgwD8pMdMhfMf78Y1M0NQXgapJKBIitymDdm4AImF3BlApNJ8POzeNFo/hzBZch99qFRyHkcvraRl1GH5nla5/1Ak87tZQiPs2oQ4+z+Sn7iP1pGTwnTad39IIfe0AYst6ZK6IsJ2708XGw+Nuov4/ViunEpoGykEEg67Rn6aBqE2kxCI4IT+yGsdMhZheGUCvKISdYrZHoiSYMYUxKzBjCrG8gFXV8AdNHug5w0Axw4rIBP94eDs/u/UoRyc7yVd8BH0mubKflmiei8c6cIIOWlFiR20yz+jkuwT+KXDekiK3wkbpCiTIonSfxyyMoIlZ1lEFnQ3rr2DaQbIVP7tadDr8Wf7qRw+wYu0QcV+JI5e62Np9hbS/wJPPbUKf1rAjDsnnJdWoQJoQyK4kOGmRb9MJTDsEJkyqCZ1KVMMxINx9H3zHM/7z8Fiq7F7x8rKlHh4vBaGWwI1pTKTUTvGWOz2Mnzi+r/7xsFLq7iv08/BYIryic9M8n54Fb+u66+PzEtF7luHEwzjHXqDwnp3oRYfQ/vMA5N+4CmkpjLyFbUj8IwVE1axtqFFYkcD2CaLnZhBXRhHhEIVN7QQHczjHT6Mlk9jZbEM6d8E4t21AVGwq7ZGGYIeczkOlSmHrMrSqQzmlEz+ZhaoJUjK7qYnAlIlWMBEKtLFpd2e2g8rnsRf1bnjnJg+A7du3q0OHXl3Pwk8aPR//9qvavv/Rd77o8vd9at8Nezh+adcyPvmuTa/o+C9l/zeDG33OV8Kr/e5vJ7fi898shBDXPL8viQyHkBLpDyCTiQX+GWLbBqyYH9/RPop7VjK21cAKKXp/bx+DH99Dx6N7kdEoTi7HxK/tJjjhEH3i+UYNcf0Cr3d2uGVaM7PQ1ox9xr1o660tDL17BYGsg1FyCH79ANb92/APTGOfveCqRmlyTvIxFiP7M+uJ/cP+xhj7/mg3vf9+HzO/tIv459z39d5uMC2sgUFXWte0sAaHaipUekOXX4TDqGLRbbIcH0dGo6hyBZlK0PcXLVSKBvFnAzg+CI06xL6wH7FtA/meCL6cjTFbpZwJEBwqoA6fvF2/Lg8PjxehXk4lfIbrOB4IgFlFJOJgWjjpGDigAjp22KCU8SFssAKCUkYiLHD84Gigl8F8xx4cHapxSaplLaUmgdLBMUAJA70EvhkflZRACYgMKvSywjZg+E0pAlMJKglJaMxBtEUwV7ty24VWjcCUQ3btSiJXIDjpoFUcKjENMySoJgRybTeljEIvJgmOKRwD9BKU04J8RxOhMQdHF0hLUWjzER4GrWBiNyeQxSrK0BClEmL7RmShgh0NUG4OwuNehsPDY6myBOahPX4CWRIBB0KApmFPLIyKtcEJxNki9roe/N8+SLh5N8U2t3xBrynH1euIm/dlEUPj2JtXwv7j7sIta+HQiVqNtcCenW248mqrV4Bp0fxcHi1XQV0ewgG0suUGG1s3oI6cRJluzbM9PYM9O7sg2Ci+eye9/941/qtGa2UV2zbA4ATWyChaLIYydESj5EIiQkG0YABVNV3ZTKWg7l5u26AcnKlplj0yRvZXdpP8O9f8T9y3icLDO5lZrhGYUMR+dB57YpJIzzLym1oJHK59Z7EYqlrFqZqeE6CHxx2grlDVUKqq9zrUZ/nnmZVKXNWoOjGuj9i2AXX4JPUKbRkIuOVbAJpEFUvIdApMEwwDVSpds6/DNz6ODIcJFwrXPKafF0dva4WAH0wLFQtjJYLo0yXKnTH8LwyiUnFEsYyamka0ZhCGgTp0onE6Crzo3j08PO5qvGDF4zosiYBD2TZOoQBCoCWTOMUiqlJxVZzKFTjwPACpz+yjrnEy34V75Lf20PpntdcTk+itLVRXtWOcHsAGrCsDVx3TuTTg+mAMB7DrDt+4buXgSj823ovH0Px+Jt6+guRn9zXeD33VdSAXWzeQ+aua4/jhkw25SHt2Fg2wakGO6O50sytSQ/oM7HzeDThqGRllWgtKLpJ/N3csdfB5QgchVN937dHqv0yg/zJAw3EY3OyOF3B4eNw+hO6eTpVtg1LIQABlO6ActKY0qlhCpBKoqWm3VwuQiTjmijb0bBEzE6ac8hH98XlUa4Zqc5iZXh+FToGRg+C4wujZCQJsn0BaEB4oMbs8SGjExJit4hx4Hm3NSpz+K26zdlcnTjLiNnzbNtbaZVR39GCGJEbBQS/a6Pmqm42QkvzqOOFLeUZ3xklcqOIYktBAnsl74sT7y5TTPnzTFr6JAtlNCRKncwhHwegEwVwRpymJGBlHCYmIx3AGhhsSuTIUcj+3pkH5ul+jh4dHjeWZMBdfhi/HA+tevcqVh8etYkkEHA2UQlkWQtNQuDfsdUrv2kHw6wcWrF59cDu+7x0idnnhnfXkA73EP7cfG9f12xoYBNzZ//o+VaUC0CiXmk89o1EvybIuXQHAl++99rCPXF3OJAwfyqwu+Az1YADHxilfHQ0os+qq3TjXjhTq5WPXw7l4eW5fthdteHjcThqTBbWM5vxzizU65k4uzDsfADgjZfRiCXt2Fok7oWCDO3ECNNd6LV6M+N6Fr+vnmUZ56qArn2hPzyCeOYqfhVkMxTwvjxPu6+bDNHpR5ntw1Cc8HCB+fG4y0waYnIIr1x9nQy73ZfSxeHi81ljfFuPUsHue+IOHNvKLf/vsS972N9684lYN66XzChzKPV4eP7X6ajXAu4ElEXAIXQPbnQGbf0Mtt6zHOXoKLZ0i+PUDCMPH1Ae2MfnWMl2f1xnZrbPizDJCX32W6Q/uJpC1CT99hkKrpPJ/70HYiuZDeXRdQ+ULWGu60Mdz2GcvuMf1+2H9Sqy4H+2Hz1F9+32ETg5jN8XhyAylt9+LsBXh54ewBgbRqoqpX91N6jP7KLxnJ74ZC+P7h1Gv24KWr5DdFCf9zDBqNofw+XDSCZwTp5H3rENOzKDKZbdfY2wcZbqBlQj4ET4DZzbf6O8AV/P+9G+HWfc7lzn35x0EjoZoPlzBf/wyA7+8irb/thctk6F8zzKCZ0ZRfgM1MIzW3gWVqnvzM37Nr9vDw+NmIQQIV+JOGDo4CmHo7rlFCtB1N7Ph96FKZbeno6XJ7SkLB1CGhqpaSNFJsStC5OgQdksCWXAnRBibRG5eS6krijFjoufc96vpEEoXaBUbY2gGqzmGliuDEBS7ouglm8Apt7zJCbnS23K2xMzmNEoK4i9MuzK+A6NUN/dghnUizw+DpqGkwImGmLg3Rry/yvQKH0ZREZiyCfXPoK4Mg+kalpJJwtgkKl9AplOuNO66VViJkJuJCej4JotUMiG0so1+6DSitwtOXfPb9PB4zfPV39jD2v/4XQAMTfKHD2/kP3ztxEva9t5lr14S12PpEwsaN15pCbIkAg5l2SCunpV3jp5ysxKTbm+HMqskP7uP9LH1OEeP0P3EnNtt4jG3/MgG2v94bspPMbeOmJzCntcNpSoVOHKSun2K77sH3XVrJViBb7kZlfr2gccPzDkB/9PcrIN45qg743dsbl2tpRnnhKtb7xx7Yc7Zt/ZZkBrKrM7VebPw8zvHT5Pavxt7fJzlHxhH72jHGhzCBtr+mxtJ2OPjGN8fZ/58oXPpRaYYPTw8bi5KUfftVtVqTQq3NsVn2+AoZDrpeg3FoqAUdjzkNo0bktzyMJHLJQCErTC70lSTPgotUZSA8FiS8N7zlLYkKTTrhEcMpOlg5E0qKT/Gk0cQnR1udtinIy8MEhnLolJxVLmCffIM2spehGWT29JKtL+InCnihAMga8IaQmD7BSpfoLBnJXrRptBmEBmx0EoWLc/kGX5TGl/eAcvGKRTRu9pxRsex1nSg9w8gujtRU26Piv3COaTfj6pU0AGRyRCc9OEkojhbVqMP3noFGw+Pu5WAobGtO8nhS25mc+UrcPx+tbQngrf9mC+Hj75x+Z0ewh3lk+/aeKeH8IpYEgEH4N6A18qcgEY6v16SpK3sBZ+BfeosztG56bH5fQsvRr3E6aWiNaWxJybR1q/GPnX2quWlh3YQ/OZB10/jGmOwR8cWfrxF5VBCClQ9CqmXUS0qpWr6m7kSMmteoynQkLX08PC4wzQcxqXbn2DbqFLJNfEUCpXLAyB8PpTjIHN5dz0gOahj9jSjFaoEDpxE62zHGAL/t2t9WbXyzsTf73OzJrbtehWFQvhr2VBrYBCj9siq5VCqwNgUNCUhm8U+3wdAsP8yChDLe+D5M6BpiKY0/uP9GLk8dqVC4PED6K0t6P8y2vh4NtDaF0KEgtgTk+htrTiJCM6lK8injyK7OlGj427JVi3QEJqGTMSxZ/OuWpcmXcWqik1xQxvMVX96eHgsQt7hsqRfexU39O/b3nVbZHFfq3xodzdxL8PxKhACfVkH9uAIwmdg37MSsfcY2pqVVNti+M+NUFjdhF608Y2mOP37qzBmJIlzkD48xdhv7qEac+UgfXkHreQwcY9B13en3SxJSzMiHMK62I/cuBYuXkbZNqpSwbp/G+Umg3JCEB61kZYidKAfUnGYmHQDnNdvoZr0cflhh/BZH0Yemv/CzaLUS6z0rk6K61uZ3OCj87FziHAQq/8yYvtGxKmLCz6u1tI8p0wlJCqXQ/jCqJrjupMvMP3z2xl9nYMWN0EoVnzgKPlHdqKZCr3owJOHKP/MDgqtGum/3cfF/7Kb5b+739Oz8/C4EygFykZV7DmRlmv0h10LMTjUyIBa/QvvxOf7V8yfkGmUXoZrRqI+ozHxoTWlQdNw+heKZTR60i721w5mXTWRATScxefjFIsNcQtreASZdxtZtWQSFfRDtQq7NiNnSjA1g7mqHWOygDTnHeOS++B7Sd+Kh8drlwc3tHKwP0tHMsjAVPG2H1/X5Cve9r3bOvnYV47dxNF4/KSwNAIOpRoXWmVWG0pR9pnzaGfcMiV/7aJlA6v+7Vw5kw00X8OCouOJWqGDEAuyDfUypzr6Dw4TAeYnLW2AeWZY8umjBIDVj199nNRn3FIu68oAvisDtH2vvn3tox06cZVKnD06drU5WNVckOGIf34/8c+7z7UNa7CByFcWNo8FvjVX4rX8d/bh4eHx2qBeYik629ALJZyhEZxyGRkKQSaF/cI5kBra+tUwNAp+/1VZ15eLDIUQnW3Y5y4iajck5oZufJcmMFe0oc1WKHfEMOJB9/xmWoiAHy0SftXH9ri1CCFSwJeAHqAfeJ9S6qoUuhDCBp6vvbyslPq52vu9wBeBNHAY+KBS6qWXFHgs4MOv7+V993URCxgMZkt3ejgeS4hE6O6dslkSAYeQEq0pgzM7i5ZKoqpVVFcrztFT6B3tqKCfyrIUsz0+UND81CB2U4yL/49G4GAY36yi2CIIDykKHYLQqGJ2BaROKMpJScuzOfThLGZnmlxvEKPgYAUkgUkTaSusgIZRtMh1+glM24TOTlJY04ResnF0QTmtwwfHmd7XgrSg64kZxnbFmN5VYd0nxrjwkWW07TMZ3qNjBWHll/OIqsX0+jjxszkcn0416cOXraJPFRAzOVS5DIYPVSyiqiZaUwon67rzyvZWRt/SRvpv3CCisDzO7Jv3EJhyKKUlelHhzymUhFJaImzF7EpF7zcq+M4Po8JBmJwGL6vp4XH7kJorBGHoKNt2+zk0DUwTmYjPqVdlEghbQaWKqJjYLQmsiA99psLsmijlpMAKCaoxCI6CUXSN/GZ6JcXlJrGTBlpFUUn2ELmiCI2ZmJFOhANCKcyQxHx9E8mzZbK9AYITCcyIxNGXExoxUZpAL9uUMj5KaYkZEVhBaN1fYWq9HyXBN6OYWQ2+GYFwQJoQHHcITNrk39CMUVQEJy3KKR1EhtwyP7F+1wBQmymhlyqYHSl0Q0f5DWRrmlJHhOBAjlJHFL7jGf8tMT4O/ItS6lEhxMdrr3/3GuuVlFJbrvH+fwb+VCn1RSHE/wI+DPzVrRvuTzZCCGKBu7Ns5mahScH77+vi88/evvrLD+7q5rH9l27b8V4Jv/nmlXd6CK+YJRFwKMfBrmUU7KksOApV69Oop+P1830NDw4LoB96f8F9LcNhnEKB6V/eTdcn96J3tJOubZdqacYeHcMJhRBXBohdIxGgAXrPMuI/utyYOQycc8ugZDSKL5eDf4AYF9C7u7AuD5A5osj8FTjhMN2/75YudH8Xsh/ajTr4PAqIHQMRCCDKZfy1Y9iDI24Wx++HUrmhx2+PT7jKUlLDudhPulb2UHcyD9bKIWI7NsGB55n88G7Sn95HtPYZMriZkIYUpvC06Tw8bivKQZn2nPHfvEXOyLzyqvr/aJ0rA2i19aNHaPxPL+Z67wOE2lpRjoMI+Bsy3gDpQ+65MdLWOnduqBFmoekgQMv3556nFi2r92fUJXW1VcsJHJjEns0T/7GN3tkBVs1LKBZFny4hKiaMTSJ8PgKnLuCYVfzHX+SDeNwpHgLeVHv+WeCHXDvguArhOtveD3xg3vb/H17AcVNoT7w2rTIPf+IBAobGOza28YUDl/n288O3/JgB45WXkt0ufPrSH+P1WBIBh5AS4fMjE3HXlTvgx+q/jNy8FpkrYfVdcl11l3WgfAZ9j6QIDykc3Z2ZUxqEhxWJv3ejiYn7u0kfCDG5I0PqSBZGx1wFqB2bEI5Clsw59ZZylSvv66btT/ZSecd9aBem0FtbUKm427+RyyG2bgAJ/Q/FWP7YKJV3bMc3YzK6PUT7j6bp+/k4LQcdQl99lviFUq0npZPyimYC50Zxrgygd3U2bgS0pjRUTbcuWqlGQ3v9gi6jUUqvX0vhN2fQ9wn6v7SZyA/CtH6zDwYnsYDmr59h8Lf2YOQU6U/vY+w39tC6d9qt03YU0meAl4n18Lj1zC+PlBooB4Scy3ZUqw3pXBkMIOIxtyvUslFVE1UuU3rjWkLnszghP05QxxicwmpLog9O4UzPINqaXffuoB9RLC/ovdBWr6DaEsM4PQBy4cVIVU1kNIoqldDSKezJKfSuTtA17IFhlFlF71lGeXkG/5GL2NksWjKJSMSoLkthnB6ESgV7egZVqaAl4ji9nagjJ93SKsPn7nd8HGtgsDH5w8Sk298B2NPT6O1taOkkzmzONUicwWNp0aKUqt/RjQDXc5ALCCEO4c77PaqU+jpuGdW0UqoumDgAdFzvQEKIjwIfBVi2bNnNGPtPNJ3JEIc/8QDbPvn9G6/8E8K6tlijdOj1q5p4/aomfvr4ML/5D8/d4ZHdWe7pStzpIbwqlkSopBzHdaHN5bFGRrGTUfSeZYjLwyhNugpPW1Zjn72Ac+I01soS2fWKUovCjCkiAw6ODkiNoY/tITRmMvpTGRKP7UOddtWjtKY05eYgVtRHflUcbc1KrJY4diaBf1ox+wu7qMY1MHQI+GFgBLlxLVP/ajfqyElk2SJ2ES7+UiuX3qswIzrhEYdCd4TOp0xCX32WkX+3Bzuko6VTWJeuoP/gMCrkzk5k93SiNTWhJdw/GFWtIoJBtFgMoUm0RBxsGxmNgmnif+Ig1X9uotRjwvkwTZ/aR2lTJ85sDrX7HrfkzIBqXGA+sA0rBKJiIpd3u34AHh4et4dasFEPMGQkghYJu8FFKIhMxJGJOFpzEwQDqJlZVL6AqlSwJyYQgQCh81mqbTFk1cLoG4WqidIk1pUBZDjkqjyVStjnLlLc2I4WiwGgt7Vin72Atv+ke9Pfd8kNKGq44wmD348zM+uqAZZKKF1DRsJo61dTXNuCtB3QdVedL5t1J3l+dAQRDoGmUXx4J7CwiV1s24CyTOzxcWTYzZU4hTlXZCeXc5X0lMIacu9lRTCAuXUJmJO9BhFCfF8IceIaPw/NX08ppeCq1sM63Uqp7bjZjD8TQrzsX6ZS6q+VUtuVUtszmbvTwOx2k474b7jOJ9657jaM5KXzzs1tr3jbX9jRdc39/eHDrhxsLPDavMd5ZFvnjVdawiyN35oQ7sxa7SIqbBt7YAgZjSJKFUQ8hrP/OOp1WxDPHGXFLx4BQO/tZvSBdtLPDGE1x8Gxaf/jvVj3b8MOCMy3bcd48hBaLOaWSQ0OUXjvTsL/+Cw2IM6AuGcdzd+5SPZNvQQmTKx4AHHyjDuuE7M0ybXMvH8X0S/uJ3kSkrilBAiB7+wFZj+wCysoMaRG65+7ylUqGkVbs9KdkXRc/Znol1zn84YErtQaKjYyEHAv5FKDUgmtpZnBn++h7U/24nxsD+1/vJeJX9tN67cvY+VyaIUK4vIw/myc5scvYI+O0XV4oUyu8kqqPDxuK8qywFFuSSiAcpBaFCdfQIaDriy2YeDkcmhNaYTfj9ywBidoYAd0/P0TVDtTWF0xgkcuIWzHLb3UNLcJvIbve4cazuD1Mqn5kt/WlTl1KqdYdDOptSyMlky6ct9CQlsGhscJXBlGCOFKkAuxIGNjXexHhsOEvuYKVoj7NqEOuj3D6vCcWsf8QGPB91FDaJorE+z3oRU9p/E7gVLqgestE0KMCiHalFLDQog24Jpd/kqpwdrjRSHED4GtwD8BCSGEXstydAKDN/0DeLwoH3nD0vGm6H/0nQB8+/i3b+p+33/fMkzL4QM7u2/qfusI777plrI0Ag6lXFfamueGc9xVkrKzWUSx6KbyN6zBfubo3DZCYPVdIv03lxo9HXVvCtsvafmLZ9GWL8Nmbr8A4X+sXTi3bkAdOdkw5Yt+0ZWClBvXIru7sC5dcUsFjp8mehzkPetwjr3gjuvcnMxt8sBoQ+ceXL8Q+3wf+kwIa2TUvWGoofd24wyP4pQXem44dflMx0Y5bt9K25+4JROd/5zFAQJZ5ersz/t+0n+7DxtQr9uC/cxRnDdsxXdxFFUsuaZjXtmCh8etp3aRck3/akljKRCaD1UqIQJu0CCTCZTjoKcSKENHSYkoVzCbQji6wM7EkZaDVrGxVraT7wow8a57SZ4Eo9iFFRRMrxbYPvBPC+wA3PfgCY59eSNWEMyYQpjRl4iQAAAgAElEQVSAhPTziny7RC8pqjFB8+EKU+v8OD7QC4rM0QKllgDmPSmqUUHLD8ewtq8iu8aHGRGgIN5vU41IpjYpQkMSYYFWUcy+bzdaGRw/RC4LtLKilBGYcUXkEugl0KoKFJRT7hgcHQJZB81U158797iTfBP4EPBo7fEbi1cQQiSBolKqIoRoAl4H/BellBJCPAW8F1ep6prbe9wdhH3ajVe6Q2hS8Cuv673Tw/B4hSyNgGMeQtddw6iJSWBOe96uZx2ome59Y84UT681RNprumB/lvCRy1iOvSAQWIw6skhLt9aM7Zw4jWhpdtepmq7T+ewsViyAxJWGrGvgA5R7UhjzjpPfkCF4vg9rbGJu/HUTw6ER97Ws/UM7dqNvo06jBrpGPciJfHn/dT+LFdYxAN/54bnGULl0TxoeHj9R1LIBap63DpZTy3KoG/pxaOf7Gk3j4IpYgNskHv3iwnXji7Yd+0Mfrebea+43tujc0rKoBDxY+wFXyls/C5kfLFwnjJvVraN3dpAZmJu8rp+/tJZmqJpuprrWKwIQm/85a5MxHkuSR4EvCyE+jOuY8j4AIcR24NeVUh8B1gGfEkI4uOXYjyql6i68vwt8UQjxSeAI8Onb/QE8bg4ffv3Nu6F/9N2b+PhXn7/xiou4E+7qdwN3+1zN0go4pIayrEawcT3mBxsAyqplC/a78id2RxNcw7xq8c39Ag483wgunNrFcr4zuPyxW8Y1P9hACHw/XuizEfzGgbmyqcYAazck9WPPW7Z4PE6h0DDoeiloTWm0I5ewYaEKzSLXcg8Pj1uIEAhNc8UpYIHHzuLzjpZMuhnI5jRMTKFsp3Gu0dIpVNXEyeXQW1twpmeQTWnKa1oxpsrIgTHGf2YlzU8N4sTDCMshvypO5MQ4xdVNBJ48AkKidbXjjE8il3Xg9A+gtq6BQ6fAsdFWrwDHgeyM20Te2oI1Mkr1we34vneI0kM7iB4ddoUtZnPIphR2cwLHkJQDOlp3hmKrn0h/HpmvgFI40QB2xIcx0YwVD6AVW5HZHPbIGKxfiayYOBeWttzkaxml1CTwlmu8fwj4SO35XmDTdba/COy4lWP0uPt4972dryjg2LOy6RaMxuNOsySaxoFaFsBBBgJuj8Qi9OU9aJkMkx/ZTfXB7dhvurexrLphrsGo+vb7KLeE0Dva0TIZtA1rXLlG3Jt7sW0D4ErILjj89o2N0itlWWiZDMKYM1gZ/9e70TvaKf/sDsT2jVQf3A5KXRUwlB7agZACGY02jgs10yzdje+0dGouA1F7FLre+FGW1Rjn+ce2cu4vdzb2o61ZiTB8jeU0p+eWpVPujY/ha/TDeHh43AaUcicJlLoq2J9/jhCGDzubRdluBtaenkHoOlpLM9qq5diTU8hYFBkOY42MItMpVMCHMe0GG9bKdjLfuYDZlkROzFBuc2cCJ17fSjGjU3r7vVTecg+5Tc3Q24GTCCO7O5BFE235MuTmtdjn+ymszVDd3IPYvtH1BNqxCd/3DgEQHC1jtSSw/n/23jzMreu803zPuQsudqBQVaidxSJZ3ERJpCiJlLzJaxzHseM4tpy9k9jxTGc6y/QkznQn7SSTjtPp6SSTdDrjyeZ2HC9xbCde5ciWLEskRS2kKJEUdxarirWgduy4y5k/LoBCcRFJLWRRuu/z8AHq3oN7DwDi3Pud832/31SO6r2b8dJx5NlJlBSUugwKfRalTokbM5m8rwOnI46wXcxj5/FMHSdmUO6OUlnXiXOPP055EROtqxN9cIDqu+68Tl9KQMBrgz/70PaX72BBHUPAK8TqWeEQ0ldjsW3cE6fR165BhUPgegjboTyUIbejh57/sofcv2yEb7ahrd2NVlOYBY/SL+xGuDC/VbHxz88z8e41xMdcomcWqQ12oOVmqN53K0pAZMswolBC7+7ybxBCJq5S1H7gTkIPHkDr76W4uZN8n07XN0c5/fMDDH5xjtH7Byl1e7QfjDF7i8DYeQ/Zp2wqbRpLgxLPhLX/OAuD/aBp2O0xarf1YEc19JJH5NwS0nZRI+PLNyWtqx2uC0qhd3cxdXuC2Q/fxZ/u+nt+689/FnHnNmpJE7FYQw6vRYzPUL1vB+N3Waz5xxrepj6U46HMNQjbQ04vBTUcAQHXi3oqlZCiuTrZnDxoWbFsFHe3pk02xR7qbtytkretz11Aq1RAk4i9z+AAxvh5DCBSl9Zu+AgBeC2vuxDra/4qsWrs398yC7lv2SjD2ne8OREjc7lmilTDE6RrYh3u8VNgWTiVCkxOcaFdWWOtxwH0rizWAwcu0aOAgIAXS1v0ZXSfVjc2cecDO2+cEtP7dvTyyUdOX7lhwItiVQQcwtDR+3pRi3kQArVjI9WQhqy66KcmQAg8QyJcOPfxe6gtluHeCpFnwhSGHawxA3MJCneUie8Pc+yXemh7FsITRbyIiTk6i1o/iHA8nITO9O4MnfvmOPP+DNn9NuZijUpniGpCw9J1Jt/eQ3TSJXnW5tz9A8TPKI79YpKO/YrQnKCUFYAinFOMv0FHqwkq3TbWeYPzb8nQ++VzqBBoh05hbRokUqiAlChdguMierJo+aKfVuE4vhuxrvsKN3YNNI35WxSZpzT+4q/ex9LPOyTPhnEsSe72EK4Fle4UsiIIT8HR384QORpi4CvT4LqIQglvIYg2AgKuG0oBHsqTTaEIGQr5Mtu2g4iE8RYWUUqhdfpSoCpkQMik2uWvUmhlBy+koRdstPOz2AMdeCGNmW0WiVGHuY06SoJWBaOosKMCz4TkaY/FtRJpQ6VdER0fpJKBxFmFHYX4qEt4vEhpIIp0FKV2nWpKYBQVSoK1oNDLHuaCzdxmi7YjZWppk7nNOuljDmNvE0TPaugViI86FLo1EFDsBWtWYH8wS3zELwaXrqKSknQcLGLHDaopnXDORis7FPssItM17O0D8I3PXe6TDAgICLghbOoKMkNeSVZFwKFsxzf6qxdkiz3PYMTjvo57vU3kRITIvgXGf3ozAz/1wjNkkft3Ef/c3ubMmmdZMGdgTRqonUNkvuTL4mYH7iT0zSeQlkVsaAC5NomIRmj/f5ftyCPZ3aQ+vZf46B3o331qxXn0oUHSa9rQHvLNaPTBAUqbsriT0yi7htbRAc+ewK1UVqhcNRChELiuPwtan6EEEIUifd/txfrqfhQw/OTya5KWtaxqBYx/7B6GP/wsyq75Ur+hkG80doNnKQICXnMoBcpF1TyEpuGVK4ia9H+v+TwyFkMVCk3Z2kYKpHHsFFpHO87EJBqgdWVRSmGMzuCMjdO16Cvq9daFLcAfe5wzI83feQxW1n7VhSpkNIpXKqGUIlwfNsP1cVYbXoc6N45XqaD39eKMjdP+/XqbUIiBQ524k9MMn1uPtlDwzwfE6yIdHY1atVbjQ5YLxU0gnEjgVasIXSe2t4iWSKC11sEFBAS8ZFbr5f7FZGe99/bLeka+5nnDhpu7tmVVBBwNWguyWwu2gaa6SWL0ysXQ8c+tVHTyKpWmUoz5wFxze+ibTyzvP3Kc0BE/vaD1wp36tB98XBhsgK9Rr50+u/z32XOYZ881Ax03l1vuwwXBBqzM7W7V0VeOg/XV/Re1b/a1hd5P7FlRtH7ZoviAgIDrQ6OeA1D28javWELGYv7YJjV/NTIeRwqJl0kh5xeQqSTFHQO4lqSakFTa1hDOKZy7dhOZ9pj/2D0IB8IzitBtWRaHNIyCQri+TLbWnoGajbNtCG2xwtz2NJFpm1pSRy95RE/NIwplRCiElwgjoxH0bAdLO3owh7soZQ30isLIOxT6TEKLPQCE5kNM3N+LuQRd35ujfNcA5YxG+mgJPbdEdaCNcodB7FwZWXPwLAN9eglhO6iz55BhC23LMO6R4zfoSwkIuLlpi5rMFWtXbvhSucE1HEHB+OVZk4ne6C68JFZF0biQ0i+YNi7IQ5Sav0oAfpG4EES+9Dja5g3NJu59y8Xjeq9/cWw67e66tblPSyQQ2/0CRnnrphWnaZwjf/+u+t/t6N1deG/0C7G0LcPL5xgcQN6+pfm8sV/cuSzeoff1+vUh+MXi4HtwAOhr+pGW1ewT+FK4wjD9mxBRLzhf08+ZT+xuHnPyV+4BwH7rHRTffzcyHkfbupHcR3djv33nJT7VgICA64LUfCEHXUeEQr6hXzTq/0Y7Ovx/69eir+nzb/AHB9AHelED3bjtSbyBTpSlU7t3K9PvHGJus0Et5vtXhOYVmW+dpO1wGb3k0vfAIv1fz1FuFxh5h8xzNRIjDvExh6Uf34W7vhd3kz/WuIkQmSdnCM2UiZ8pIm0PpMTtTFK5cx1yoUjljiHsvgyVlMQzBHpVYUcEpayBE4b48QUSBycJjcwy+NkxOg6WqHVGMRcdOr95Bul4qKhFaKpAet84em4JpUmMUxPUelMgJXpXFtGWptYZQx8apPijd1/hAw0ICLiQX33b8GX3qVUqmKoFBegBLayKFQ7leShcUBdIwXoubi7nz9o9/HRz8/Q97WSOnqDwY3cT+8fHm9sbxZLO6JifotRS/OguLaHPLOLgG+fJesqWCIWaKxGNlZGGvKysP7bOynmT05S3Zgkd9Fc0Gvu1rRub6V/e3HxztcYrldBSSZwRP41CVaoo1wPpp1xAXQrXMMFzkZaFl8/j5fOs/dgoE//7PaROOnT9ia+1X2k3SB2Ywc3n4fAxOi6wEwkICLjOeC6qnlokNM0PPCIR32cnHgXPQ9XTjryuDG7YwAtpCFdR6A9RSUtCix6e5tdkFAZdSj0Cr71Gd3YB82clY/NVdFmmVDGpVeIw71He7hGyapi6y+JYEpGsMp2LoAxF+pBkcQNEJqPYcQhPKYo9oFUyVDZWUGWBOdNF5llF7q0GbsRjbpuOF3UxEjWcmobUFYU3WLhuhGSixMKZNLt2HuPp8T4cW8dbWEN6YJ6FU21o2TJqtA03pAhPSrRqHGlDJNtFaMFBSUGxW0dt6K7XwAUEBFwLIe36zA/f3n+h28+LR8rgtx6wzKoIOABkLAaeh1csNpVW9DX9KEPHPXkG73W3M785TOe/nCLz136aU2r/eYrv9Osw3DftQHv4aT+3+fTZ5RQlqSFNA69axRkdo/Luu4g9MYI91IXY84xvWrV+Lcwvomo2+XdsIfrFx6m8+y7C3zq4ItUJ/JSmyMl5VDRK5fVbML/1RFO/vnFume3AOzOC3pX1HdEnp9CznTjTM7hT02ipJMp2EJqGWz9+Q79fOQ5aKsnZf7uVwS/l6P6/94DUmP+Z3cQmbBKnioilAloqiVsvDJ/5xd0r6k4CAgKuIy0KVXjKT6nK50FKf5/rIRIxqNnI2SVELEy1O0Ho1DQJu41YSMOO6lRTGm5I0PM9KGYloadNQrk2CnaKDikodhlkv3wIr1RC7+1BFYvNMaBncABvcnpFymVXb49fqF6tIjeuw42aaKfP487OIW7fQq0jjFZySD82g5tN4Vo6WtnGjZgoDfL9Fm1fOkT5TVvAS9DxzX0sdHSwZvbICnW9SyVAyEgEEY3i5nJIy/LTRLdtRJucZfbNg6/wFxIQEPBiefOm7I3uQsCrlFURcAgpV9RsNFcqRkab2+SjB8k8uizxqKXTOCOjhOpttPoKiNNSUwGA5+JVli+O1lf3+3KSkTCN9ZRW99voFx9H7+5qFmxfitK6NKVdHWQ+fwDnjdub+vWNczeKK50W88HW5+7C4opCS2GYKMdP9laOg7tUoP/39+A2UsI8l/Sn9qJlO3Fzs5DtaN5oAGQfmbmk9GVAQMB1oFEsXtehXTFJ0ahLa0jf1tGP+jKxjI4hgVD9n4xEkOkU0foYqA2vwzszirBCGLUaXrVK/oO7iH9+n5/G+cSzaFuGcY6eQIbDaOm0v3Kr6zjnJ9AybXjFIursGJRKuEr5NWoHDjfla1v7IdszaKEQeB56oR2vWCT09Sd8s8LhdYhiGb2z3Z9ImcohNA2ZSuLNz/uTJWnfl9ydn2++92YQdOAwDpD8TItBaUBAwEti11AbO9e03ehuBARckVURcCBASyYRyQRqfhEMHdWXRS6VqAy1YyxWqaVCKE1w9kckmac0FoYhfkawcGeVwc8Jpnaa2EmFsSQYeCBPsT9CoUej559O43WmkVNzeO1p3EQIfabA0tZ2aq/rJnq+RqnLv/QaRYVrCqxZG7b1UU3pGEWXc++QdK6fpfSdTlCgdCjfVsZavJWxtwjSG3ejJFTTgv5/XULUHPBAuC52JopwFfpsASZziLYUOC6qXPYlgMu+ZK4ghBACVashohGO/YdhVHuN+Mc2U3kuRWRSUMnA0F/plLb1Mnf/EG0/OI75thGe/2gGoTKs/eca5ui8L8WZm4PcFT73gICAVYVXKq0Qz3CPnwJWBjHxz/upn+oJX7GqkfLplUrNm/xG0bo7M+vva/H9aCpZXYJGewAmlgMDd37+oqAJQNXTXle0CwgIeNmJW5e+XfvcR3ZfcntAwGpjVQQcylP+jH2rd8TsHB6gnxlBQXM2bvgBv8g7U7/Idda39z/Qcjwg8gREqM/eTUziAbJYQuTziL5eIl/ZT6S+wtDIWNQHB5p1GQBWPW1p07G1uCfPkOTkRX3f8JUL3gusWBlpZF02VyAaJlrxOF6h4EtXxuOoShWvcVNRqbD+1/Yx85HdtH/yaVrxolGs8wm6vvU8/HG9D7+8rMrl4q/+BBf+gIDrRL12w38uEYbeTJGU8RiqWvPrOqIR3I4UTjKErLogoNRjUYtKQkseRt7BtTSUBlN3asiaoNJrY8zrOF01rBMhpAPSASUhfdxhfqOOlVNIB8IzDtL28ExJ5OyiP7FhGlR7YljjeWodUaZ2WngGCA8SZz3siKCWEpgLCq2mKLdLkGDHwJpReIYgnPMoZSXlLt+HKDbqYUcFbUfLuJZG6PwSStOQiwXQJCoaxklaKE2iL5Zx0mFqCQNZ84gcGsMezMKeL97ALywg4ObjB27p4r+8/1a+fXiKB4/6GRO/+tbLF5Jfjo5Y6OXuWsA18rYtWf71yNSVG77KWBUBB0rBNdQWtc6oXQuNtC1nbPyS+1uDDaCZttSactWKNryuOQP5Yvty4fNW2j95cV2GVyzCoedf8NhBsBEQcB1pkcEVhomq1VCeQpoGqlxBhC1wHFSxBFM5zGi4ObbE654YAHp3F05/B8L16F8KYZ5fwouFEEfPrFj10Do6oFqFzgyxh2YpvW4j1aRGZP8pyKTB8yBk4mRiVLIh4gcmmH1dL21PzzLwuSlUuezXcYRCqGqV0vvuxlx0MJZqOJEY7QfyODETreKgj85QHe7CjoXo+d1ncHYMo3RJ6NQ06Brl27vw9CTWeAG7L4NWrCJKVYQXQitWEKUqxrkJzHgcLx3DnZvH29Rz/b+jgICbHCEEH9jZz/eOLd///PJbN7zAKy7NH/zoNr7+7MTL2bWXjVv7Xr6C9dXK2U+8C4DBj339Bvfk+rMqAg6h68iQX2QorBAqFgZPIao11GLe95bQNIRpsPiWYWJf2MfEr92DVoHEqENotorY8wylH7mbfJ9G79fPc/4Hemh/towxtQRzC37xZP2mQNu6kdzdbUQnHMKjeWbuTJM6UUbueRb3DbehNIExV0HLLeCMjbP4E7to++oRzvzKLSTOKNKffYJjf7aD9Z+pUdi5i1KHpJYCvQQ9f7QHvb8PZ3QMfe0aan1tmCfOoxIxVNhETM7i5maR0QjCNMB2wDRQhSIiHgflQc1m6e2bmd+g0fcHe5j81XtwLOg4aIOA2VsM2o44RE8v4B45TvH9dyM8sHI19IP1VRjbhvIN/FIDAl6DNM07DROvZoOqImr2ipQod+ECLX2poSVivvFfuYy7sIgGiP4+1LGTyA1DcOL0cnvPxd0yCHUVvtA3niBEfRV11vcZ0hIJxJZB4gcncUZGSY6M+qufqSQilYTZOZTtoPf2EPvXI3j5PArIPAHilk1oTz4H+CvE2vh5Mg+B7O1BPnqwuR0gMjLmF5BHIuiRMN5i3l+pPQle63tcWIR6SV7DKDUgIOD6k7CMKzd6GXnH1iwPHH7tzeZfLb/7nq389j+/NuRGrzrgEEJowJPAuFLqh4QQa4HPARngKeCnlFI1IUQI+J/AHcAs8EGl1NkXOrZyHDy3hGTl6oVscdV279uB9tDTxL7gpw/1fXUSt/UiDES+/DiJNf04I6N0/oV/StfvfLNAW+/K4hw+Rlv9+/WAtueWj9EoPleA2jAEQPIz+3CBgd/d09w3/L/4xnwJlp11m+/H8v1EnDMjyDMj/sV5csr32airu6xY1Wj0r0VhJvrFx2lYvLgm9P3BHvT6e+v5hr+9+uY70I/4bRt4F7j+BgS82nklx6Zr7Ij/oBt+WpVj+6lWUviu4rqOCFt4uVmQEhmLojwPrScLswtomzfgRUxq7Rsw8jZLWQtzOEvVkkRrNqVNWfSiw8TdEYq3l9lgb2XiDUk6DlQo9pgkTxQp9vm+P9ZMDTuhs7S2h2R7gsKaKNJVaBWFOVfBHt6JOVumkrSwjriMf3gbHQerlLIGbXvG8QDvdbdjjs8zfV8PydNV8lmTmdvW0PewzVK/QeejOebubMead9FLLkoKPF0Qmi6jQhr69BLUbFQiCo6LnU2gNEHo3By8uIXhgICAm0xp9td/YFMQcFyGjdk42mtIOvhaVjh+GTjK8v31HwJ/rJT6nBDiL4GfB/5H/XFeKbVeCHF/vd0HX+jAQkq0TAZcD623Cy8WRp7PUd3aj3V2Fnd8glKnif7euzB+aZKkWebA8TbCI765nhNROAkPa0JD6SCcftqer8vMCoifLSEXiky8vYv2Q2XsOwZwTUn0XIHzb0ziREGr+isUsqbQqhDJOZgLNQx3kLMf7KE0ZKNHbdz5EPGTGokRl8TeEUZ+eoje7xUYeWeM2Jii0CcY+HYJkbkVUXWpdEUIzVeRhRpOykIoMEZyuJNTyHQab2ERmYihKlWEaaLKZWS2g9x9fQx9+BhvSh/jTHUPX07vJn0YItNZpu8wCc0r5m9zEe+9G2taw4kqMocUiTMltMWyv3Jy4hq+3YCAm5dXbGy6JuqBvrJrzRUN5TjgOLjV6kXNm5MOjWtxPRWyYX8aqT/q+CsKZl2Rr/vR+umArqf85wmWa9caNJSvFBB9cuW5jfr2xrG7/9tU8ziN1Qv56EEcoK2uuhcH4p/392XwJ3OSxy6ua2tMd1xYmi6PX3p7QEBAAID+Grr5/vq/ex19qQgL5evgHr9KuKqAQwjRB7wL+H3g14QQAngz8OP1Jp8CPo5/UX9P/TnAF4E/F0IIpV5g2l3WZ+UzKYTj4oV1RHc7WtWluLmDKCA8xdKAzuKZLLu3nMRKVkkd15l4qwtC0b7HYGFYgQQ34jGRERh5wbq/Pe8bbiUidBwoUhgIo6RfeJnbmcAoKPQyKE0QmfawwwKlgTVdZnpnnGjWory5gpw1kdM6aH4QM7VTsrB+CK0Gx3/WAuVQ7QBrSkcrVFGm/9E6EYm5JEGXCIU/66draF1ZX6lKCtB1kLb/XEpULMLc2ypsN8r81wNv5xdvewSxtoh3PIoT0ZA2LGz1sDJlrO/FKdxTou2bYSJTNtp8CTwPUQzyqQJe/bziY9O1IuvF48oDpfyaDrvmpzJZFhiGX2NhGn5Rd8igtKEda6aCXCrjxS0WNsWoJiWlLoU1J7CjfpG4uQTVe/KYpkN+PkI0WUEIRWEyhrAFWkXiWp7/GPaIn9SwY3539DLYCdDKYC4popMeU3dJpA1aRWAuQHTK5fwP2yT3W4RnPRaHJNKBwqCDjNskkyXmp+NIy8Ur6VhjBmYenDDoRXAi4FoQngK9olASQoseellRbtcwih7xM0WUFPB4UDQeEPBi+LE7+vj6odVZg/FS+bMf33Gju/CKkI4YzJfsFdu29vj1KsmIwaGPv51bP/7tG9G168rVrnD8CfDr+JNc4E9wLSjVtAYfA3rrz3upZ+sqpRwhxGK9/UzrAYUQHwE+AmARQZgm3sQ0GDqG7aCW8hjpJPqzi1R3rCd5ZIHkv04Secdm5FbFwI89y/zXNzD8rhPkPrqbjsdyLK3roP1pRb5PI33cYXqHztRbe+j8l1NI24Hjc0SNzSwMWaRHyqQemcDLF2D9AHJ6nvPvG6L7wWmYW2TmXevp/uYY5Q2dxA5alLP+BTT9PCRPlZm8O0zX42Vyt4cZ/uh+9MEBZl7fS6kL5Mwi9mAnxvQCiVOj0NVBZTCNcBTixGnkrZsQZ8+jXBcZieDNLaBl0jiTU2jtGdyjJ9n4H/sYs9vw/rPgy7/3NvpyNmNvgfKSRvqES88f+WlUJ/78bvo/azL6Dg/1hIk5Z+KZGpp47cwUBLymecXHpqumJWVS6L5SVXOlo2Yve+dckPYYmZ5FDfbgpiMoIUgfWvBltcenIJPCPXlmWdXuTxTOW+6g+ztPNV+vtWdWytk2tqfT0NOJGzWRz51aUXgOsPbLfqG6ikcRjotz+iybHunAzeUQhknSCiFjUVAKZ3oGLZmg21rEmbh2Hw2r/qikhrZp3TW/PiAgwOdNGzuv3GgVcS2rFr2p8CvYk6vjL3/yDj76909dueE1YOq+XukHd/a/rMe92bhiwCGE+CFgWin1lBDiTS/XiZVSnwQ+CZAQbcqbX0C5HrK9rWn4J9b1o0ZGMeZKYDu4C4vEzxTZt3cT63aVaP+VAi4QmfFwj52k55EUE/ca9DxSxVyoMvSpRdz2hG+WV78R0BcqdD6Ywxkdo/DuuzAKDtpDT1N96x2kTtYQNRsnlyP9dzkcwDINukpx5P7DjPyfd6FVPYzZIj3/9QBae4a20CAAdm8bc7dA4pQC18U4N4MzPuHncMctjG8/6TuoA+roaZTy/EJ4TUM5Nqrkr0h4i3nwXJwzI5z7T/cw9DcV3JCDHddY97eT2N0pylmz+TkKR2B99XHa23aT/tTeZhni2xwAACAASURBVKFmYAQY8Grneo1NV/1Cz20GE8p1V6xweJWqL44Ri4KncPN5P+iQGu7SEhzy5bIFfl2Z1p5Z4X3RWvOlf2flxfCSwcb6tTC7gHv4mP/6ln2N2jhtwxDOidPQMlnaqKFrpIW1nvdq1O8a77fZj+F1qHPjyGyH7zuUL8DUzAscISAg4Er0JC3ylZsjOXFNJnrlRquI7QOpl/2Yf/C+bewYSBMLrQqdphvG1bz7e4EfFkL8IP5EVQL4UyAlhNDrM4l9QENrdhzoB8aEEDq+zcXFV8QLaBSHt7qLqwN+ZbfXKgO7/1nW+fXazZvqRtG0+cCTrKn7cSjqucIjF5znmaPNi6/11f3N7caD/kX8wp+we+I04oR/vEbReOO87swsxoP+WxOPHWToMS46hvJceMp/Hw0H9eYFuSW32637c7RerAd+xz+f1jgfIE9C6893/a/4RfTpT10soRsQ8CrnuoxN10Rj5aKlngMAz3cid1u9hurbL8Wlgohr4XJS3rA81l4ouvFysMJlnWXjQq9lXA8ICHhpPPobb77RXbgmnvntt3Pb7776U4Yux5s3ZV9wf8x8bQQiV3yXSqnfBH4ToD6L+O+VUj8hhPhH4P34ajA/A/xz/SX/Uv97b33/d681R/rCWbJLMfXv7iH7/+xp/q1t3diczbsWmrN9HX4qgd7Xe1mfjua5LpPC0OBq+r+ifV0Pv9EXEQrh7diE2PvMVR/jQtPCgIBXOzdibLokQviGf5qG0KS/cmn55lrCNEFKMHRUyAQpUIZGrSOKE9YwF2qUsyHmNmlYc4paQmDkoZb0azaW1ntERyVa1a+9KHUJ9DJ+7VlFkTyyQGlNAnPJBlchHz2ItCxktoPSpixKFxhLDsZcidKaBEoKnIikkvJr1bQKVNsEwoPOp8uUO0zmNmsYedBLisSITb7PID7uUElrxM9VmN0axigqwrMu4QMjuFPTqN23IZ88inJd9DV9KF3DPXEaLZ1GVauIsIWzaYBir4WSwGeDGo6AgBeLvMmKq5OR6yvF+3Lytf/tdfzQnz16za/7y5+8A1AMdcSu2PZm+z5fLC8lrPoN4HNCiP8LOAD8dX37XwOfFkKcBOaA+690ICElWnsHSIE3kMVOmLimxFyoUeqxCM/UELZHucti+NcP88bU8/zxsTlO3rkdZkN0DM/gKY9ybQvOwRTZe88zU4jiPZ0knFNkH/JrQ0oDCfID/lsWLlgLHnObNVAQzimMwjrckAA1QGKkRqHXxCh5TN8h6bpjktlChPJYHBV1kKEBknssku87z9TDvZT7HTA9zAmDgW9XUFJQyproZQ9pK4pdOqlTFbSlGsJ28Y6c8KUyPRdcX8feK5XQEgncpSVO/ILB0O8N8JbOY6wJzfA7X/4AdruNOWEgXYFx+zy3dk7wxOgA3kiUvu3nWfrH3USnXd+PY6YAL+wPGBDwauVlG5uuGs/11ZmU55v8VauIcLipPtdIR5Jxv9REP5THjMfB84gWiyQ2rsdtUXxqTCB0QtPXB6AtEkFmO3DqylGytwfr0PN+utZgP87u25Dncjgjo5gjo36NWKWK67lEC/1+0ON5pKp+AaMzOrZCfjwKRKWGkAKZTuPmcmTqfQrXjU7bHwO9rxcvFcedmgZAPP6cv5qLLweuDw74hfLRCCIRQxVLiMcOksy04c5fsMoTEBDwqucvf3IHH/37m9ODpzcVZnzh2oR41nVE2ZCNX7nhVfKRNwy9bMe6UVxTwKGUehh4uP78NHDXJdpUgB+7puN63rL/xtQ0ekvHWtOHIsDYl+Az9NHJ81xYOpVued7b8ryRtBA67MtEtvJC2YWNTL61/+Q/9l2q0f+A/gvztuq0/ldrnFexLBt5Kcf0RorV8M/7OpbfIwz0s5aLU6ZywCALzb/bW/oR1HAEvJZ4pcamqzy5/1hPm2pubtQ/FIvNba01ESvqI46dRMu04c7OoWXaVqxWOr1tUA84RCiEmvc9O+xMFM74N/wylcQ9eQZxEtxQyygnRDNty7lMWpPX4v/TeB/a4KDvJ5TLIW/bjPfscZRpoO69HfHYQVTEAukHQ2jSX8XxfGUuZYVwGsHTBSlkbt2YMCAg4LXGzTGLf6le/sOH7+aNf/Tw9e7KCt5/xyXvQG8qborEMRmPrzTKCwgICFiN1NOrpBXyb8ANA2Ea/mqHbSOiEZThD7te3KKWCiE8cMMSrezhWuvI9+kYRT+IqWQkdhS47x60GpiLCicscCJg5MHZNUjmcC+uJSn0bMQoKKxFFzsicUICNwTpEzVQoFUc5rZE8Az81K0CzatrJQPde2ss9RtUMwLX9H2JYre24Wkw9Us7iJ40kA50mjtY7DFZXC/RS22kTrqEJysU1oQJLfjBTSixDaUJlCbRijVEsYJdl4H0dAkPBilVAQEBy/zp/bff6C5clhdT+H6tuboffeM6njg7x9RShbH5V6etwU0RcATBRkBAwKqnKXfr4ZVKCF1HAMp1cefmV0jhNmhkNustj62rsMlrOH2rgO/lxCUzey6zo7H/MtuTn7ngby7uW7xlEbbxTgXgNVK0TvhyGvKFuxAQEPAa5D2391650Q1gqGNlsPGl//Ue3vcXVxhIXwQfe+cmAH7rK8/x6X2Xzpq52bkpAo4XQu/uumZdeHHnNjh4rFnYLXTddwR+iWgbhq5K+UXv7WkqVl0Jcec21BPPXt3502nfUKxm++9n6apeFhAQ8HLQolAlIxFUreanK0mtXlQuEOEwMuqHBioWgXwRqlW89f3IfAVlGVS6oliPHsXbtg5ZqjG9K41n+EXewoXuB6eZfn0H1bTAXFJ0PJVn4nUJIlMeqcMLiIlZRDxKdU0bsuanOcmyg6w5KEOj1B8jdmAcFQ2jQjqVnjjWZBE3ajI/HKbtqJ8CVu6ysCOS2LkK1YzJ3Gadvu/kmbktRmjRIzpWQZ/JM3d3J6EFD6PoICsu2oFjiLX9OOkI+mIFbMdf7dE0xFIBb3EJipf7EAMCAl6NGNrNkVLVymc/vItIXUHqix/dzfhCmR0D6Su86qWxruPmkhG+FlZFwCEMHS2VQeg6XlcGO20xPxzCzPtKLI4lEC6U2yUbPnSMX+v5Nr915r1sTk7yjWPdePNrCHcXKI/FyQ7nUEowNZZGj9mEnonQ80gRpQnyaywcS6B0CK/ZzvnXCxAK4Qik7f8YjLwgOu6f144Jij2C+F052sIl3pV9jq9NbuPEaBYrVqW8aLFt/RhF2yRXiGJoLvOndzH890Vmt8UILXlUUhLh+k7BelURP1tBzZXQnE6EXv/4PQ/leVCz/XzoTJrFP/V4V89hdkY+x33hClsf+Tmij0Wxo+CZwG1L7O47y2w1ypG9Q6y5c4ypr/STPl5DOgpztuyXywYEBLzytBptKgWehzBNlKcQdQUS5TgI18WZnEJaFsK2QQhEPI4sVqn2JdGLNtZk0Ve52ncItX0rwoXOQwXcsI55fgk3Gabzq6egPQ3Tc7i5HL2n0ggrRG1dF8a5GhDF0wQS0BfKVPoSWON5cDwiDx/FG+pDTs1BMo71/SOomo3Rk6VjPszSljTCBeko0k/mmLmnk45/HSH+tMTNpsh+dwIvHqbcF0PWwmQePQ9SombmUGt7Yf0gYmYeI19ClcsIw0BVa9CeRtVqVN6wFb55Q76lgICAG8R9N5lhIcDudctrvjsH29h5Hc75M/cMooDf+eqR63C268uqCDjwlH9Rclzk5Cwy3EX2u1PM7s6ipMKadQhNl6n8hIPjSf5i6j5OjXZyaryD0EkLcesSUauGvmaR3HOdDO8codhhUsxbVDo9yt0WRsHFmndZWG9gzXksDmqE5kEvgFCQGHERLhS6NaJTDvl+HU8TJE965NbHyBXTVF2dye/30nXMI98fgi01Rr8wxOLdFZQrCJ8KEZKgTS9gFqIUuzSSp23mhw3aD1WRtocx3VIoOjWNTCb8mxXXg4Y0muvSH1/kr773JhZfF+ZrCwbRSJVSNgISau0OlhI89sCthHPgrvMYf7ifvqdLuJaGrLqI8tXL8gYEBLxELkiXai3Ebi0kb7h9e5UKVCpIy0IVinijFcwzFmJoABy3qeQkPI+Ozz6DiITR2lI47TEW10WwsmGipxdx1vcgcjkwDZTnYZ7NQSqJ0+5LVhgTCygpCB+dxBkdQ1/TD5k06twkqj3ty/RuXkthTZTIZBVjfI7k01O+CpZSsHUjbX//BPR2U97QSTWtk8otglJY334G595bUE8fAaXQ2jPIuTyqUkU5Du7kVPN9y0gE78RpZDSK+a0nXqlvIeBFIoRoAz4PDAJngQ8opeYvaHMf8MctmzYB9yulviKE+DvgjUBDJeBnlVIHX+FuB9xEvFakX18qQgh+9p7BIOB4pVCuizMxiYxEcEslxOQULpA6cbqZ7qR1Zen44SlyP3I3i0sOPWkda8ZGe9jPpROGiRxei3t0P2xeT89h/8vShwZxTp9tnisLaNlO0tk2xNRcU9axQUQIhG4QqqdbyVs2kfzMsr7s2o4Sbi5HYvtWys9Hsb62h87/vvx6edtm1PwisS+MEt19G2LvM/Q+6kvdXsq/41KqLbJa4+Tf3MKGv97L9z+4i/jn95H6gTgJ3cX62v6L2jfmDeStm9AOjvra/1qQKR0QsNoRsaifZuU42GuziD3PIG/ZhJZpo7xjDdaeYxTesY3wdNUPHgxJ6tN7kbdvwUlYaM+cQNYldPWhQbx4GC9soE8too2UcecWkLEoKpNC27yBalccc7pIeVOWalIj/cQkQtOQjsIzJWpugfLuYSI1G3tNB0vdIfSB7Rglh/CJaayIBa6HnMvj2DVCz483jU5FNIIzMuqv3lghhGECoHW2oyIWmlJUB9oIHTrrS+wFrCY+BnxHKfUJIcTH6n//RmsDpdRDwO3QDFBOAq1ubv+HUipQAwh41fOjO/r4p6fHXrCN+RLuwYQQfPzdW/h4S9DxMjhG3XBWRcCB8OsovEq1uUlLJXEXFtG6u3BGx3Dqs2WRLz+O2L4V/Tu+e/fCT+2m/eFRnNGxpvFfqwGganzpQqD3dKNScaZ3t5H5q70gtaYUZaPgU+/K4kxMNo0AnUx4RZGjm8uBEKgDh7HqKUt6dxcAS7vWEPny40jL8refGMPFdxHX1/SjIhZiMe+nS3geQtNQjtOc9Wz008vnyfy1X4Gp2f7/MvNbT+C+acdFH53z5jtwLUnoG08sO7IXgwTpgICbgdYJCFGvRfOe83/H5gM5PPwxD8AB5OmzvhTtfB45ZeOUSnh1Cd3WiZVGECAtC3dxCebnEbqOdtTBBczDYAJeNIpXLBJ+rt4fwHzgSRxAr9ZIzCR9fxAhUFuGm2Or2L4VxsZxJqf8sa1UQS0VAFCuh9cih9tar6afPBNIdq9O3gO8qf78U/gS079xucb4xpnfVEqVXqBNQMBNR3vsQvOEi/md92y9ZMCxvjPGyekCGzpjDLa/tFqMn7lnkHfc0kXF9vife8+yofPKBoKrndUxDa5AecrXf08k0Lu78IplZDzeNLxqxQv7cdLcz+0m9em9fqrA0CAA4o6tAFTfeSdi5y0Ix/Wdd2MxVLGIWCwQnXTRtgyD5+IViv5r6uFjowDdnZ1DS6cxR2ZZ+Ond/rG3+8fWBwcAmtsrW3r9FZp6cOBVKnUpX/8CLONxnJFRRKWGcl28fB6vWMTN51F2Xbkl6v/nlGFfX6bybt9GQNVXIXMf3Y328ErTHG39WmpJndA3nmgeQ8bj6P19aKlr0bcJCAh42ZGa/1CfgGhurv/Wha6vrP24GoTwJ2BGRi8plnHhuUQ00vThuJQwhvcCkxNuLrdsRqjUyomcA4ebz52RUb9t3dywIcZxIVoi4fcxErnk/oAbSlYpNVF/PomfDPBC3A989oJtvy+EOCSE+GMhxGXv2oQQHxFCPCmEeDJ3CS+qgIAbydWkfsVCl56r/7G6V8Yfvv/Wl9wPIQTdyTBr26P8p3dvfVWkpAm1CtZpEqJN7U6896rlb/WubHPF46WibRnGPXL86tqm082L6suC1Jo3A01JzdZtL5EH1RefUkpdjzqngIBXJQnRpu4Wb3nhRq1Bg5B+0UbLuCp0HREOIywLoUlUqYy75EvIaZs34KQiCNdDn1oAT+FOTPpppPVVXoRA27QeL2LixEPNiQdhmL7h32Vu2hpqeA0fo4ajuIxEEKYB3Z24R08AfqAi0ymcqRxC09C6OsG2/RWM+nirZTshk7povNQ6OgBwZ2bQ1q+F3CzCsppjdCPQoLsTFdLxIiYP7v3tYGy6zgghHgS6LrHrPwCfUkqlWtrOK6UuKccjhOgGDgE9Sim7Zdsk/sLZJ4FTSqnfvVKfdu7cqZ588slrfi8BL43Bj339svvOfuJdr9h5/83f7uehY5cer17J814r/98jpzk4tsB///GLs0oauJ7C9RTD/3FZAeOrv/Q6NnfH0V/jKe1CiEuO76smpeqSwYYQzbQjvZ6nDDQvZPZb78B48KlrOI+4KBGucfGUkQhisO+ywcfLGeQ08dxmgCHDYbxSCS3mz342bkiuhLx9C97BV19xUUDAzYTQ6tK3Vsg3+jNNX3kOwHURloUqlyEeRYQttK4OmF9CjYxDegPamUlKdwyiNIgqxdzr+9GrisTTE6jFfDMwMNszzZQkZdf8YOOCSQp9cAA1t+DL7uKPrcIwEQO9cPwUXrmMHBpA5ovovT14C4uInizYDuK2TfDsMdzJabSMf7+p6gXw7tQ0XFDzBqwIeBqy4FoKf/zWDURbCrc9gXA85NQczvrul/OjD7hKlFJvvdw+IcSUEKJbKTVRDx4u/qKX+QDw5UawUT92Y3WkKoT4W+DfvyydDnhV8bf/5i7+8zeO8slHrmwfcCP58BuGrthGkwJNCj7xvm187EvPNre91oONF2J1BByKS/vJK9VMA2gEG61cU7BRP97l8EoleIGVjpc92Gie2F0+P1cfaDRfHgQbAQE3lpZx6nLpRE3qIhEiFEJV/Zo1secZv3biW/6NuwMkPncelOLCJKgLRSeAi1ZEm2Nly1ii7Bru8VPN/nrPPY/X+qKGf9DZluPUxzy3pR7jamm8Rtk1nLPn0Bb81RoPkNfomxRwXfgX4GeAT9Qf//kF2n4I+M3WDS3BigDeCzz3SnU0IGA18cE7+zE0yVyxxubu+I3uzqpmdQQcDa6QTqR1dDD+ExvoeKaC9tDTl213tbyQUV8jDeGVQBimf2NyiRWXVpY+tIvEZ/ctp0bUUyICAgJWH0LXQfPrNkTjUdchbEG1CrqOly+gtaXBCqFKZWhP48ZD2AkTveggXA8nYuBENPSSSylrUOqS9H59ClEsU93QhVayUYaGMTaLOzndDFwaQhfg14l4xSJaIoFXroAUyPWDeBGTYl+E+LPTuCfPIKNR1Oa1iJqDcBVOKkypK4QbEigJ4ZxD5Igvqatt3Qjjk7gLi+hDg6hwyF8Rro9hjXNe9LmEQohYDK23CxyXWncCHgrEjFYZnwC+IIT4eWAEfxUDIcRO4KNKqV+o/z0I9APfu+D1nxFCdOBPHR4EPnp9uh0QcGMRQvCj9dqNgBdmdQQcQiCjUYSho/q6kaWKb4YXsXCPHEfv7cEe7CTfGUJ4MHWnhdhxD73fWWBxY4Lo+SrG5CLzOzuxI4LsQxPguhRv6QIPIk+PQM3GW9eLNrWASkRRQuBZOjMf2U37J/eSv38XoQUXI28j9jyDVywhtm/FjZnMbrWIj/mStN7rt7M4ZGFHBV2PzJHblabtaBk7bmBHJWbexdp3HLo6EDUbd2LKl6WsVH3H32QCb34BYUT896iUf3PieWAYYNsIK8TSWzexuE5S+uV7sOYVsbFuFodMYhMOtbiGUfQoZjX0sqLYK3FNGPjEfrSuLKpSQZUrULjRX2xAwGsEIVCuC66LDIVQtZpv9KfriFoNEYuiqjWEEKBpuJk40jJ9/4qlCsJViLLN9L1pst+dQmYTKF0SWnIp9Erm7uwgPOOQ7zeYu82g7ZBEX9uLNdcFCgq9OloNKpkNRCY9lAbShviZIlqxxtKmFE5Y0LY/h+wK46WiuPftoNBuIBQ4liA2XiPfHyIyZVPsNskcrrAwFMKO91J47wBGQZE8G6XcYRI/U2RuS4zw2jvRKi7lToP42TLS9lACltZHsWYdakmdxPElyp0RzFwZeX4arz8QtFhtKKVmgYuKlZRSTwK/0PL3WaD3Eu3e/Er2L+DVzec/sutGdyHgOrA6Ag6llmfG6svujfzfBuKxg0SEIFKfTTv9h7vxDh4hXrcWKrz3LhKf3Qf4KQna+rWEvu6rNzXXTJ6cvyhFwVp7NwCexkpDKuWiDhxGAh3fX95szBZJf/8ASx/ahffc86TT2xGPHcTEr5ZrnE/UZxVVtYpbXZb7XSGB2zhV40lj9aJSIfrFx2mIquldWWobekicszG+/SQN+Y+GHk2y5TjO2PhFxw8ICHiFaVmpXGH65zh+ulXLNm9sHMbGL5KHVUD7c/74If2SDTSgp6W+M1P/dyEvJOToAtHDy8+t46dQ9WNfKLTYqBrufKB+vof8x1aBx1i9r+mW4bIxUjc+hUTdLsgEvPp+VT//NafCBgQEvKq5e+hSo1rAq43VEXC0oCUS2NvXYR47j1IK4lGc9jhatQqdGSbe3I5rgpEHsfMW1JPPIW/ZRGjWxn3TDk59SGP9PziUUwZxT1HrT6MvVODECGJNL8rQ8A49j7x1E2J0Cq2m8N64neSpEure26mmTULzNfTnz1G7dRAnrBH+/vPIZILcWwfIHFhg6UO7iI+Um+cb/j7oa9cw8oFeIlOKzNPzyEIFbyoHW9aDJqi2hwmPLKA0DTGZa+ZiN1K3GgaHMhJBRKOcv38D2T/bw+RXNtP3bxfxDIkd1bAybb6Szfw81R+8k6UBnc7Hlxj54SSdTzlYMxVE1UWbnIXzV/iwAwICXjkaKZP1Ry2RwM3nQammwpQqFpHJBG5fB7JUwz18zJe0DoWYfcc62h+dwOn0C65FzfHHj6OnEH3dlNe3Ezk2jZpfAMPEWd+DcX7ON9+7bbO/cpxbQKXiuIePoa9dg92dQjge2lyBymAboT1HEf09iGIZry2O3RahltRBgF7y0Mou5rkZ0DSc02dx79uBa0ryAwadj83iJiy0M5MIw/BXeWJhRNVGzS/iLiwsv9do2E/F6u2hNpSFR4KUqoCAG8Wnfu4ufuZvLjYRDgh4JVk1AYeMRJppCcYzp6EtjXv6LJqbQZ6fwi2V0OMxer54iuO/OoSRF0zcm8Da7HtxjP/WPfT/3h6yvbuQ33uCWLYTZ2qawq4u2k5O+asmuTlqtw6i7b4Nb+8zyHic8D/vR+2+DfYdwrtvR9PJW926CSNXQiZ8kz5nbJxC7xqqyTSdB8sIx+Psj0QY/sW9LH1oF6UuibmoSP/dXsTWjXhTOZbetY3YF/ahtWcIH7f9POpoFK9a8+s4HBuvWPK3lSt+PnW1ilcqER8bgru2wUNxSttiLKwzSJ6xoTPTVKyJPHqMkU9souuLk6z9jL9yIspVVCzip3AFBARcHxqKep5C1pWq0DRfoSqdgpqNCofQsu14sTDlvihKA2PJpZbSKXRrlDuhlrkbpSn0RQ3hwfTrOjCndGo9NkNrpjl9tpNIeh0Af3DrZ/itw+8hP9pFcmCRxVELVA9asQ9z0Vfh0KopPBPit+6i0CcRLpR6PazpOOUuD+vO2yn1O8iaJDwpiY15zG4TOGkHdNAshbvUjYg49HRKzk+BmNMJzQlO/lQGua6AsX891qwCAZV2QeY5GyecRasphKNYXGcgbUV0yqXU7te28MiN+qICAgLeONxxo7sQ8Bpk1QQcaBpeobCcmrCw2DTOEmELSiWcMyMADP3GFFqmjdLd6/AMgZZI0P97ewBIfmYf8vYtUKzA1DSJf9jXTKPS0ynMJ47jbhvync3zecTOW2DvMyz+xC6Sn9lH9Z13Evrmsmu3wE8J0FJJ+n9/D/n7d6HtO4KqVlm333cZDy25pA4v4B16ntlf2E3nl4/hFYt+sJFIrFCW0UIhvGIJLZnw9fo1DW+p4NdxuK4vp+k4WDM12P8sXfv94nbzW35xuxdaTp5Yettm0s9K3LkFRH/XCjOugICA64hSfg2HUn7aZGvm5CVUnqwDy891IBap13Q5DlpXFq8tjlwqocIhxEK+afK36dYIeFDtivKn6kN0Oh6953PU+lK0OzXmN1pkDhUQnsINGxiTizAzjzs/T1tfL5XhLvT/tpzS1BCiENu3+imkt28hdVSgnjqM1tFB4d61aBVF+MAY9oYeNo/mcLIpyt2+QWnkS+AZRWopEwRknq0g5wt4qahfiF6xieyfh2w7anSCqG0HwhcBAa9hXr+hfdXL4ga8Mqwa478LzbX0Nf04I6Mv6ngv5bWvJgLjv4CAl8ZVGf+1IHQd5Sl/MkFIhBT+5Eb9JltLJPBKJYRp+r47mzeA7aAiIeRiEbsvw+zWMOnjVUpdJk5I0L5/Brs9hjGx4K+aTEzjDQ8gR6chEYOZeUTIbJrzCV3HnZ5BDq9FCYEsVXDbYsjFEl4ijGcZaIdOodYPoE3OooolRCKOm02BBzJfYum2ThLPzSIW86hEDFGzUSET99hJ9N4eVM1eVsSyLD9Qas+ApvmqWZrG/9/ee0fJcZ0Hvr+vQucwPTlgMAHAIJIAAwgQIm0FSiIpryntkbVOklbrlb322u95z9tjy6vnd3zeez7WBtsr2/u8lla2tSvbkldaWVmiSItKJEEQJEBkDNJwZoDJoadzV9V9f1TNoAcYgMgzGN7fOXW6+naFr2533a7vfskdHfOttpUqRiqBKpb8eh8zWb4z/hd6bNLown/LyJWK/92JAnyXnnslFf3T3DwrvPCf+H9WjgOmiSQTqKg/k19870MkX71AflsruTaL6S0KKy+Ets1i/mMdlRTYOfBCz6Db6gAAIABJREFUYM8pkkMuZsXD6G0ifHKE/PYOIhcKGHMF5u5txix6hL+1D7OpifGfWk/T108x+k/XE55VxIdKmGUXPA9zbJbZXR0YjiJxdJLh97RQSUI15ZE+KSgTqgmhmvDjSVIDLnUvnae4oZnQZAlzfIby+ha8kEF4rEC2L0Xi71/E2L4ZdfQ0KM/PXlV18AoFjGgUFVg4xDQY/OhW3Aj8y3/2bf70hXeQOmzT+snnYfe9FNqjXHiL0PWNKmbRId8ZxagqIlMV7AtZpFxBZXMwtczfq0bzZkBkQbnANDFCFy2zXi4Pto0Z9S0C7vS0X5m7KYOVL6IcF8YmYW071Y567DMj1EsrVrZEXClmeiNMPNSIZ0P+HW3UnfCo+/IQ05uTNGSLlNfUEYqFKaxJED8cQtkWyjQx8lG8E6cx1nXjnDmHatmO6j+D1dNFYXsL8Z4OjHwJTJP8T2wivvcs5nQeNx2n1NuAKN+dK3PKzys/sS1K+kwVc80DqJKLeaCf0k89RDltUP/0acjn/UKDtoXMCc7QMFZPl5/6t1BEUkl/XA+HkKAvNBqNRvPmYUUoHGIaUJeCiSmcvk4A3JiFWbed2HARp6WOyNMHiTdkKDato3xPgdxkjIyjsPNCxzOTVOtjTNwTZWaDResf++5VdHUiHn4AdcgmfjbH3PoE0VgMd10bTkwYe2o9dl5RbDDIt8YQD9r+6hDTT24llHNxwwYnfrWJ+BC07q0wviMEKGY3Kj+VZVuJqiskzoc5/55O2v5nP8UHuoldmKCasAhPlVGmYDgKeWArxlQOFY34M5UhG1wPs64OiUXxZmYxUkkwDNr/4/Pkvt3L3/zpu7F6oPyWOYYje0gOepTTglTh9XeFqDsRYvKtZZgJsfmT0yjLhEr1yp2t0WhuLUr5We08wFP+DL9hQqHguyzlC4vqC3nT06ia6tyIYE1M4x0exYvFsF6eQUIh7GGL+u/7swZGLEZLSxNUHby+bqJTDt7AEPmHmrCzJuHxEu75UdxdW6gmbaz2FEbZxZguYPatg1Pnke61OGcHSMQiqNfPI3VpKt1NxE9PowoFVHsjyjaI7DtNeG0ryeExVEczxY4Ebf84DoaBe+QE7lvvx+poJfHqENHmDE5fB8oSzIKDkS+jyhWstlacgSHEEMyONlQ0jLIM5Pw4zlLFCzUajUazqlkRCgeAioQx6jOYhQpOMowTNSnXBeIJmM07GPyAw+ObXyFplbgvNsCnux8lW4pwqrmB+NZpSq9GeeBdRzns7CHfqQhNC05C0SadIBAZKzK9yUQZ9zLTZ+DEFXZW8GzBiSpSZ6CcEQo/uZnJbUJ42kYZEBmD+GOjpH86x9hEI7mjKWgsU5/JMZuLsqFtgqEPpFlfP8HAExkyf+hS7W0lu9bCbjTxLEhccMivTZAsVKCzDckXwXGReFBi3TD8XP2VKmRS7DlYIeuco/U3DvJk4jA/85n/A7MCI496GCWD+g1TTAynmd5iQdZGRV3yG5uwSi7WXBRjJq8tHBrNnUb59bvFECQSA8BsqEdMY6Euh1IK1dqAGwthlByMQpliVx3lR7vxbKHQbNDxzVGcxgTWdAOMT5F7yzrCM1WM77+KMRUj/FoBBaQ/9+JCKloFGD86sChF7oKaY5hYyU6MWMzPhLWhF6f/DMbgEN7OexBA2SbKMvxjhSzyj6wjuf884YPHFqXwNZ975eL74fMIfqybqj1fTXcscm81TKzutXD2JvpYo9HcNCHLoOJ4i9oeXob0tL//vm13/Jya5WFFKBzKcfGOnMCbr1jLxRoTwEJayQ1fh9NB0wG6sHideqC+ZtPx34MWnkfC4YUKvAvnAToDd9ErFqAPqp13f/2S9j+AIrCG0UXNGfw/2TbOkwcauThz2fz85Ye/ch312osY5/ntfgb7wyR5hofpxD9Ya81m9UvuvPQfv0ajuU3Upr8VAzHFr7/huiAGhuehAIlGcSbHMGIx1JEpTMvCaG3GG5sgAtgnTmH1dJGZy+F1tWKf9cca1dxA9EIRZfuuWmrrOth3CABrTQdefRLvteOYqZQfOzE9vTCOAZiZDBKLQqHox42s7wHLXAgYV/sO+UrLK8ewmxtxpqexRuIkzp6HZGLRdUoohGxeh3fg6NW7JBzGzNThjE2A52K1tuCMjGKEbJxzr9/iL0Cj0Vwvj65v5NnjY4vadvVe6ani9rGpNXXHz6lZHlaEwiEimOkUEolALEq1rQ77/DTVtjoqmRDRwTnKrQmcmMHEVouP/Nx3+P9++A7EEeKDJoU2j+RZg/QZh2yXRTUBbgQajrhUowaNP76ANzqO0VjP6GNraDiSR4lgHTvHxFNbsMoKCRR9JRAdrxK+kMWpj2OfGaGyvo1z/yQCCqKj4k/nAUYFqgkIzYJZVjgx/7OWP3keq62V/P2dRIfzVOsiGBUXN2Jh/eN+zI3rfb9tQJXLqEoFxLjYIYYw8Yv3M/mQgz3pp6BseblEvj1EbLSKUXYRBTMbosy8u0D916NM3A9d3/DlVraJkStd1M40Gs3tYz7xRo1rlVgWEo0i4ZAfx2CaYAhmfcbf1HGQSBhVLGG0NOHGwhiRCNX2DNVkM9EXT+IWS7gPbaHYHCI2UkZ+fADv0ft89yXA6mhn6ic6UQLpyD3MrYmR2juIUS7jFYtYvd1QqeI21WHO5FCROGahCOEQXsiCzeswXBdjbBqVjINSFLvrKTzWQ/0rUxS3tBKaLuOtuQ+j6mINT6Hm5pjtSyHrd5E4l8ccGgfbxm1Mo149grlxPVJ1wHVRpTJmKgF2CCJhP6ZjchpzXRccXrZvS6PRLDPNyTBjczp1/5uNFaFwKKVwZ2YBP32kORLDKRSQswOECSrVvgaRTIboP0zzzO8n6WPpojURwNy4HvfEKf9YqRRONotYFs7AIM3ft1BDF/BKJZw928l89oUlj+Pi6xWldzyA9ex+en+45GaXYW5cjws4F0YIf2PET6kbfDbf2fOyQfBgEgrhlcr+jKRhQtWl4TMv0PCZxce+dB4g82PI/LW/nv6bi3KD32cajWZ5UI6DmpuDuaBh3gpSgxtYYM1MBnV2wLeC/PgAIYCWZmRNG07cHzWUJRTft4tK0qAaA2/bHpQF4oITA4iTOTiD09FAeUcHhUaLUM4jvXcI9eoRHPxCqUZbM4xO4GzrIjwwiTt0AadaQTq2ovYfIWIYmCXf/hseKyKeR7kpQqExQipiIW4T5x9zSR2zES9OKB3Gsw2yXRbxrofwLCE6WkFcD2UIoeFpv75Sdg4yabxyGXXkxB34BjQazUplV28DXzvoVyYOW8YbbK1ZLawIheNSvMLFJPZmKoWbzQKgutpgevoN9699oF/Y1/GrcbinLjoPy/MH/ZSOW9eh9i9dw8J6dv/ljUs8PCx17mtBOc6CbMCi4FKNRnOXM+/aNO8uGrgxLXwcifguUJfgjo7B6Bihw/gKCBALlisxP8kQDhaAmpEF9fJh3EAe87lJXMtaGHvmxz/35GmMkxcnLhQQPnjxeEYySd8P57iUxiXkkUvOjw4W12hWDP/mnX2XuVQtB9s60sstguYOsSIUDjEMrLWdqFIZt6cVzzKw5sogwsymFGZZkW81cCNC4bd24E6FWbf5PEPf7ySUhbkej/CkQXjad3GqJhU9X81TaI+Sazdp+57/Rye5Atn72wHIdptUktBw1MUJC2rLw0QnHMZ32LTuLVNosak7OosXtpjcnmDmrSUyz0RIXHAYfcCm2OFgFgzCkwZmBXLdLplDBnPd0PulLE4yTG5NiFDOw8q5lDMWZkWR3D+MSsaQXBFVLKLKlcV9YRrQ1syJf5dgc+cIJ17qxkk7tK6dYvJAM9ERwc4r5nrAiSuiFwzy3Q6EPNZ8wyR+NgeAMTQGyz+WaDRvbi6ZQLi06N0dL4JXI8+iiY5r3X3ucmVDo9HcfSzng/7v/tTmBQuH5s3DilA4lOfhDl/w/wBHxzC4OFuXPABWawuxkdHL9sv8sw6SX3hxUSD1PLLzHmL/ay+pNR04Q8ML7faGZqL9Y8S+PIi5oRe33694aTY24E5MsuZb/napQAZz43oaPn2Ixr8OUXrndqKvDtD5bf9J3mprXagAbAXnaclk/Fz7QP3mDbjH+jFbmrFH/X1q/+KNSAQMwy8EFsw2GpEI6tQA6z9YoQr0cmFh+xSnMXZswTtwlPlcElb32kVBmPN2F20n0WiWESNwpJx/wK+p1aEcZ6E4qdRYGeaxOtfgNqfxIjbW6CyEQ6hzQ3hbezFnCqhYeCFo23tkB8aPDlzc+SrW10Wy1SgeYodQ1QpWRztIUEMjkM9saloo8Acs1Etyl6iefk3dEov5MWs6c7dG86alORl54400q44VoXDA1WfbnCWUDYDkF1688vGCLC61ygaA/cz+hYf+eWUDwL2CuX/eRUpVK4S/uW/Rg/y8slF7nlr3CPdYv/86urSpoXZ2c/7632jG89LsMDrji0azwhABzw1S4ApGLHCEUgoJ2QvbWGs6UOkEc311KAOqMQOr5OGGhdiYg5WrMvTTbeTWuahQisiQjTITGBWh8ls7sI/HKPeWsX79XtTrcZyGKsmjIcwiVJMQHVdMbVMkBg3anptl5JE01STELihKTUJdv0ux0cCoQL5dSJ/xiEw7nPvtNVg5g86dwrm9HbQ/FGN4fztGbw7z1SSFDRXMiIMMRvFshSgITxpUkwqzJMQuKMwKeBY0vpql3BTDnvUtuVPrY4TyHnzp88v05Wg0Go1mOVgRCoeEQpjpBlS5glGXxksnUGETc3LOf6A2TCrvvA8nZpJ87iSFPetxQwaxL+9l8ON7SAwr6g9mkaqLd/g41cceIPyjIyg3sJMYgpFK4fa0Yk3M4Zw5B4DV08XoY+00fPoFyu/ZSfgb+5j9xd2kzpawDp2htLsPO1thpi9ONQ5Nf/4C2Z/bTervXmTmgw+TOZrl9M+kSAwKZlmRPlMhcmYc59zrmC3NuKNjC+l5jWRywR3BbGoC5SHRi8X+VKEIgDszg9lQz/Hf3QCGYsNv7GXs1/ZQd7pC7MQYhY3NzHXauCFhdneJDR96BbVnOyMPx2n7wyXy8Go0mjuOWBbKU6AUXj5/MU13ATBMjKqDl8/DEMSWCB8zYjG8QoG2m7yl55NcKqDl1cs/n48JydS0bXj64no3/oRGDwM3dH4FhGusOHVL5+jQaDQazSpH1BuZ3+8AKalXu+QdC+9rH86Xeg9QfdeD2E+/fGMnvBa3g6tgbunDPXryhveff5i4pnMFrl4AZl16wZXBiMf9B5ar8Iz64n6l1IM3LKhG8ybn0rHpurnJseZaMLduxL3GzE/XMm6AX6zQnbx65dCljjXvIraUq5jZ2OAfUyk9NmkAePDBB9XLL9/g/7jmpun+2DcWvf/Nxzbwm4/13ZFzf+XAMO11UXZ23/naH5rbi4gsOb6vCAsH+EqFhGyoOribuvy0iqaBVD0kV6T4js2YJUWu3WJyu6L+sDD5mQdJHQrR+p+fZ+zX9tD6gynOfiBD1//1Au5b7yfXEaLuRA5zfNaPlUjHmdmSIt9msOYrF8hvaiJ+YoKpXc2Ech5KBCcqZF6dhIkZMISpd/ZSThm4UVCPzBD+eppqUmiLhTCncgx8oJ3EkMf0Rr84x7r/eMTPud/aiHusH3lgK9W6COIprGwZOX4O1nUiJ84iIijXQ0I2Ypp4xRJiGkg8xrE/WEdo1OKRdx7iR88+TPfHXwDL/7qctz/AdJ+fuyb23lHij59h8Hf30PxyFbPiYeWrmLNFuHptLo1Gc6uYj9GwLT/VdSQMnkLCQY6paARKZVQqgZeKUk367eV6i5l1JpEpRWzMxbOEya0mTkLRcu8owxcymFM2kTGDwhoXFfIIj9jYOSh0eChTERkzKf/OfdQ/GyGUVzgRIdsjVNIehiPYWSE6rlAGmCWY64H6I4rJbYKywMr7n1dSQvG+AvXpPOVnmshurWDOWHgRRXjCpJzxCLXlMV9NUskof78xRaFViExCdMIj220QnlY4USE8rag7U8KzDHLtIUT5yT+Sgx58/ovL+GVpNJrl5qkdHcstguYOs2IUDm9uDrEsjHQKs38Ir7cdXvIr4FYfvY/oP7yEWBajn99E3y/0I5t6aXk6R6W7Efdt94MBA0/V49mK4nsfopw0afj6cdzpaVRgJZCd9xCa80j97Yt44TCx0QncuTlSTUmUaYBShAYn8cYnoa8blKL+2/2M/5M+PFvIDSdpOV1m+CcjnH5/ksbXEjS/UsF++mUyySSn/9023GwWs28dTn0ca0MvDIwQOW9TvGeNn3rSDsGxM0jI9l0tLAuvWFrw+QYDd3qWRL9NYtjjhc3dRLbOMPnRh6kmhNY/fp7ZnhBtXxsgd/8aIo+fofzEThKvKyJjBTAMzLEZVEkX1dFo7hjzRf/Kru86dSUrQpADYn7gtYBUjRUTIP6li5tvTqWgtQn35PVV8ay7ymfz84nJL1ztCP1LJuN4I5aqGWzUyKMTYGo0Gs2bkxWjcGCYKE/hTk5hbt6A2n8ECYcxGxtwfug7H1d/cjvJr4Zw9mzF3nscp1Bg+rE12AVF+zfPM7W7lXR/nlxXDLvoMfWejaQ/9yLuxCRWVyfekdPEz0ahbx1MTuP1tGMWq7jPH8QOMk7NOwGY5SoqZKEKRZy4YFQhet7EfG4vnZXt5DqjJL/wIsWnHiIcZI7q+Z0X/MxVp85hqa5FQen2hRHM9T145wb92huuH36+yO1ADD+DTXsrnX/djzs+jni7iY5XsZ696PzceHCOwj0dFBtMIkC+1aLpR6O4/WewWltQ1apWODSau4QrJayAoI5QUEtIo9FoNJq7lRUXwzGfovFWcD2xEkvtVxszcStZCCCFJX2dbxXaT1qjuTluOoajhkuL/oFf2BTbQpIJvLEJvELh4viTySCxKM7wearvepBK0iQ84xAenAHT8CdEbBNlGUzcG6fuVAUrV6GSCRM9MwW2Rak9SfSUn8jCiEQwGhvAMsFxUbEIjIzjZrNYrS0QCeOce30h1bZVk1LcbKhHdbYihTJOY4JSc5jo+SKl5ijR4TxGvkSlPY2Vr4JSGLMFymszhIf88VMKJdzWDEa2iNt/lmfcL+ixSaNjOJaZbx8ewTaFX/qs/x3cyRgOzeplxcdwAL6V4xYpG8CVlY2aQM6lHvglFoVC4bYoG8CCsqHRaN48eJXLi0+4uTxGyEYScbxS2a8N1JDAzPqKiTcwDA/dQ2Qoi1UXxT4/jduYotgaI/LN/Qv1NBprMoSHAFIp3GwW++jF2j9eqYQXKBBWTxfemYGFsc8ZGfVrbBgmBNn9vOkZP3VvNOwrDAePYXV1Is8fJNFQj0SjRF4aRgGuCOZJtagOkNV/ST2gwSE8y7qsGKJGo1keHt92I46TGs2NYSy3AOBXGjcb6kF5C21WTxfAxRz24MdqXILV2rJo+2vBe2QHAOUnd6Ich9lf2L3oc3dqBrOl+Yr7Wz1dmBt6mfnQwwBkf3434//q4cs3FLnsGswG34PaTKV8S4fjIOEwYof8GJZYDDOTYeJXljge/vWamcyiNiPiF9Gxerv9OBCRIB5Eo9GsGJZ60PZcvFIJZ2AQPNd3w3zxNdyjJ3GPnvSzQL10CPfoSeT5gzjnXke9fJjI11+66oO7+wZuWM7ZgcsmWtyJSfBcnMEhX7R8HmdoGLf/DM7weVBqoe6POzm1uMbRNVjKb6c1V6PRaDQrmxXxVKqUQmwbM5lE4jFUXRIlgnrLDoyhSdz2DVgTc8x0hgm9bxfZD2cpHq0jfe8khR83oqx1VOMKp64FqQrR8yaF9RXSB/xMMO3PjFNtTmDNlnFjNpPbIhQe20NyQMGTO6nGBXPrRqqZKKWmMPGhArM9cZzIOqySopIQJvZUSTXmSfxNitl1JpFJxdRbKqQG7mO210BcGPvKJmYH02z+xBBYJjM720CEUNZlttui8VARNVvCbG2C4RFEBAncLObTUBqxGIRsMifLDH58D5WMR2bTFM3/Kk9haxsXtoVoOFLl9V9wsM5FcGOK6IhBw5EqhYiB1deEUfGwZ0ugLdUazZ1H/GJ/YvvDq8Tj4Lp+xXDbAttChSxQirm+NJ4lVGNCeNZjaouJOODdP4frGFi2y462YV4e7KQ6E6F73ShV1+TJ9iN8et+jhBIVKlMR7FkTJ+GBAjNvgAFu1CM8aYIHTkIRuyA4Ub8goFGBfIcfm1ZNKiprK8i0zeb7BshXQ6RDJQ4e6+LtO44yXYlydKQVdSJBtc4j2p6jOBcheiJMocuhri3LzGSC6JkQZhmsPMRHXUJzLngKs+ph5irMbkwSHasy+FgIfltnqdJoNJo3EytC4UApvOycbxFIJai0JDGfe4XC+3eRmo5iHX+d6Sc2MvawC5bC7K+j8wcO6sEKORtS5xTFJiFxwGB0N8RGFcV2k7lej7pjggrbWPtOUHl4MwB1p6vUH/cID0yR39SEMmBiZz3hWY/k8SmUaZJ+ZZTs9mZCcy7ZrhC4wtxAGqPFr6jrhoToyTBDb/P/YJtfKTN4b5jwmIlXl6TcniB5No9RrFJuTZA+52DNFJHJGRDBmZld8NU24nFUvuD7eBcKGMDrv+Kw4bcHecc3jvCpo28h0xkl9J2X6Xg+ibe5m7ofJpi+x4NUleZvCpNbw4SyCrPsIUotWFc0Gs0dRoyL9SlEkHwRIx5F5QvgeYhtYSQSoBSpZ0f9dNeNdWAYhGYTRPvHcP5g0Ldcru9mMraGnn2HsLrX4p4fIVR1eG7Xbjb3n10UcC7hMGZ9Bnd8YsFyqsqBm1ZNAgtj+2a8g8do3NKHsk1kZBKxbZzzI3ib1hGO2lRKETaXxxk6lWP2F7ex7tlzOBcOXXKdghEO45VKXNkeHGwai5E+WAEx2DDcxZk32F6j0dw5NrUmOT4y98YbajQ3wYoLGl/JmH3rrjs95W2RI5PBnZ5+w+100LhGc3Pc6NgkYb8OB4b4MVsiiGUjkTBeLudPNtRWICcocJovLOkq5T16H8YPX71MebhemZaKHzObmnDHx2/omDeC1dvNt0//Jz02rSBE5GeA3wM2Aw8ppZa0j4vI48AnARP4b0qpTwTtPcDngQZgP/BBpdQbBmTqoPGVwR999yR/8my/DhrX3BKuFDS+ImI47hZWgrIBXJOyodFo7jAiC4sql1Gui6o6vnuVZSO2hTc351t0g4QWtQqANzd3xbgMI0gNfqPKxqXnquVOKhsAzplzd/R8mmviMPBPgR9caQMRMYH/AjwBbAF+TkS2BB//e+CPlVLrgWngl26vuBqN5m5jRbhUScjG7F2PFEqgFMVNrXi2QaHZou5UkXJ9iGyXhfHuCWbnYkSjFd7W2c83fvgAyoBwex4RRTxSIREuM1OI4ngGxVNpzBJ0fL+CFzKY7bappvxqu54N5QaFnRWqCYVZEawCiAvJQY9KXLDKisl7hGpzlY09F+hNTvKdH+1APFi3Y4jpUpSJiSSqaC34TrsZhw2fqVJsjZDtNDHLiuSwy8w6i8ZDZezZMka+jMz45ktVLIIdQiJhVLHoB3vbNqf+Uz2/uu0H/GbmHG878hTnBpuQnInVXKRatOlbO0r/YAuRU2GM+2fJj8dY/7cOxaYQ4SmH8PAsHF/mL1ajebMhwRyOsFB1HMCIRvDKZT85RMjGSCXxpmeQdAqJx0ApP03txDTu6Bg8dA/mVI7s9mYiU1XCp8ZQhQI0N+Ae60fu28rkjhT1x/KY2RLKNkEppOyA51FZU4cbMgiPFZGqS7EzSez0FNW2FEbRoZoKoUzBrHiIp/BMA3u2hFGs4qSjOHGL6OkJSr2N2N8/iNnRhgrZFNY3UE2aiKdI/MN+zMYGKhvbMbMVjEIZqTo4zSkq6RCxoyOoWARlW8iFMZyNnZi5Ml7Ehr06hmMloZQ6BiBXd8V9CDillDoTbPt54CkROQa8Hfj5YLvP4ltL/vx2yau5tTzQ5SeiuW9t5g221GhunJXnUlWTsrYWc+N63BOn7rBky8AVrv9G0C5VGs3Ncbe4e95t6LFpZSIizwH/dimXKhF5P/C4UupfBu8/COzCVy5eDKwbiEgn8C2l1LYrnOOXgV8GWLt27QMDAwO34Uo018tUvkJ9PLTcYmhWASu7DofUFPwLHrbnfZXFDoEhSDYHQP+f7aLlx4IyIDZaxX5mP8a9m5i+p47ohEN0MIubiqAMwcxX8A4e84MkXzsOO7fBS37go9ghUN4llb79h32zpRlVKOLe04s8f5DSTz3E0NsNMseE+IhL/IXTuBOTqLfsQH58AKujHWf4POrh7VjZEgyP4OWLCzVFjO2bMbIF1EwWd2bGD7SsVEF5GOEwyvUWfL3ngzDNujSFPX2Ev7mPsV/fQ/OfPe/3S0M9Xm8HRq6Ee6z/ouiXFEw04nHI3dZvTaPRLIVhIqbprysP5amrprAVO4RyqrdsokGjWQoReQZYqvDCx5VSX7lTciilPgV8CvwYjjt1Xs3V0cqG5nazIhQOEQOjvg5c10+Lm8sjVQf3bfdjzVUwBkYhGmHyow/zb972TT6ZeTvNXwsz8KRN38gmTnw0DYZCKhab/qjAqY/WEx80sHMxWiobUEphblrPyP0Jot27ybcadHxtGDWXw52YZOKXH6Zpf5ahd6bp+MTzeGuayHclSJycpfT4TobfZhCZMJCnJpj9TiNzHX3MdUP314tYW/qo1McY/JUuur+cxYtYkC9SeM8O7DmX8MAUhfYEpW1pkkP12CNzSKUKo+N+gcFKFRzH9/VOJv1aHMD451p4au0P+exPvo0n3rGPZzJ7EAfMMhTaFF4oTuZoI9NbFPWHhNk+aP+hS3iijDVbRLJ5rXBoNMuB56LmFYxrsFiqasUvuKdcv7p4OASxKMq2KK9JY8+UmOtNkG81adk7hxcyQYRqwsKoekQGZ1G2hXfkBOy6B3E8vJCJUXYwhyeOJb7dAAASCklEQVRQ9WnU4AWMeAzlergTE5h1dRCyffetGsytG0EpCj1pjKoiu9am/kgBUQolgj00idtSR7kxSrHRJP25FzHicaSrA+/0AEbXGmTOz9DlNdX5Ez2XYLW2wIVb09Waa0cp9dhNHmIY6Kx5vyZomwTqRMRSSjk17RqNRrPANSkcInIOmMMvHOsopR4UkXrgC0A3cA74gFJqWnwn0E8CTwIF4J8rpV656gkMw/d1Ngy88UnoXYsXsVAiuBEL1dOKUaiSGqjyyW89Qd8Dr3N6axfrP59j+F31hCchNA0t+/JU1zaSOSzkO0E8AcfFO9aPsW0Tbd85j9uQxHDiVNbWU0m14NnrSQ76Vg6zBMaOLRRbYjhh35c1PFogNJWm0FXFeK6RtV8bov9XOmj/scvYzhi5rgixYQPxoJqJUElbxF51SfTP+g/9hhDrnyDcmMQLm+C6KNNAohG/EJZSiCFIyAbLQhWKSCJO4/9p8XdPvJ0NnzrO1+p30HHUZbbHxCzBmu855Fsssuuh7rgwda+HSriEsi5WtoQXtjGKxev+MWg0dxu3fWy6WYIU1WLZFzNW4VsgvXzeT4VdKi1YQNzpaazutQuB1dYJUEBiPySCQ85n+ggHr4tsJy++hgIEfz8HYGQUCILS5/e5QuIJ98gJ/9hH/fcNNZ/J/PEGhwgRVDTHLxDI0ZP+/rWJNS6MLHkOJ5BHc9exD9gQZKQaBn4W+HmllBKR7wHvx89U9WHgjllMNBrN3cE1xXAEf+oPKqUmatr+AzCllPqEiHwMyCilfltEngR+A/9PfRfwSaXUrqsd/0p+0lZrC87IKGZd2k8xGbJxJyYx7t2E99px8u/fRfyLexe2n08dOY+5pQ/36EnfchCLLprNuzS17MIfP34V8NpKvdaajoWqulZPF87ZAWTnPah9vnuW2bcOcdwrZl8xkknENKCjFXXar/CrHN+igee7dYlpoqrOIreoRX0RuG0tpMUMZk5H/7c9JAddYl/ee9k+2k9as9pZrrGpRgAQAzFN/x4H303TNBER3z2ysQGVy2M0NfrB4fEoki/6SSJsG2yb6poGzGIVJx0m3xrGcBSiINdq0rLPz2w1/NYUZhlCcwrPxC9KmhQiM4rZHoPouEIZkBxymNpsY5ZBXEXivItZVojjMbc2hFlRlOoN0mccShkTNwKxcRdlCLGhAsNvSxKZVMTGXZyoQbHBAAWJERc761BosYlOukxttEm97iJKUUkYRCZdwtO+QqUMwUnYhIL3XsjEzFcoN0ZRpvCDb39Mj00rCBF5H/CnQBMwAxxQSr1bRNrx098+GWz3JPCf8dPi/qVS6veD9l58ZaMeeBX4RaXU0mnRatBpcTWa1ceVYjhuRuE4AbxVKXVBRNqA55RSG0XkL4L1v7t0uysdXwdm3h60wqFZ7ayYscm4GLOxYNUIhfwK46aJEY1AUwNkc77raCKOyuWhqR4pV/ESETg9iHS0UthQD8p/aI+9nsWNh7GOnIXWJiSbw+1sxpzI4g5dwEglcCenLpskkXAYs70V5+wAVlcnqlBC4lGUZaJGJ3C39SJ7D6N2bcMenADTwG1Ko14+jPP2B4icHEFFwzjNKfLtYdIHxvHq4kixind46fR3Zl0ad2b2yv3juWCYGPEYT2f/So9NGq1waDSrkJsNGlfA0yKigL8Igr5aav6oR4CWYL0DGKzZdyhoW/SnXpupIkLsWq9Do9FoalkZY1NtzEZgfVyoe+E4uOUy1D6Mz1cIr6kUDsDJ04Rr3JI8wNrSh7eh0099e/I0jIziAMa2TVSaYoSG6nD7z6D2bMc8eAo2dGFMz6FsC6unC4olv9ZGUG5DLAvZexg8F3nhIAtpMwb8rrH+cT90tPtJO/oDV66+dciR00gsity3FSNf8hN5RCMUNzQROTeNioWptPYRee4Q9HVjjE2jCkUwBJoacE+d9ZNeXEkp0Wg0Gs2q5VoVjkeUUsMi0gx8V0QWTXEFPpzXlW2iNlNF2mxUZjIFpum7GfR04h7rx2xphoY6pFSh3FWPEzUJf3PfwjEK79uFVfQIfXsf+ffvInlylpmtdSQHitgD4zjD52H3vZiTObzXh1HlMlZvN86Zc5gb/bofXn0SKVbwUlGUIVQyYULfeXlhxtDq6sQJ/ojlvq0U18SJfO0lZj74MA1Pn2b8yXWkXq8wdl+YhmNVjKrCfvplzKYmyveuJXx4EOrTSKG0cBwjHkdsCzyFm81erEoMKNddmCU99/8+zLr/Mc7Yo02kz1SwcpWFLFvF9z5Ers0kcd4l+pWXKL9nJ9FnD2G0t6Lm8n7Ofh00rln93NaxKSX115dF5zZkmnKD+IhL8Q4fx+RiDIc8fxAP4MBR//UKLMrMdwWc4fOLZZhXggoFmJhcFDdinx1YeB/CV5J47fhiGQIl404XGdRoNBrNyuCaKo0rpYaD1zHgy/gFgEYDdwWC1/kAiStlsrjaCXCzWdzZLBIJ+8rG+h7EMHCPnsQ5O4D5vVdwwwbGDr+w6dS/eBiz4lGuMzHX9xD/4l7U0VNkDk4hzx9k7N1dmH3r/CDK4RFf2Whr9eMsdt8LU7M4g0OokEWlw3cl4KVDRAd9t4R59wRnYBCrtQXj3k2MPpwm8rWXAAjlPdzRMer/6gWMskvHn+5HGYL9tG8eVqUS1rP78SancOpiOAODmHXphWBRL5df5ALhpwT2/FlHy8bc0kdsRHBPnKLhv71A+NUzWIMX/6xj3zqIZwvRr/jyhL+xD69UQhmCl81iJBNoNKud2z42XSvzBdMM8+J6YO0wIpGLbfObW4vnesQOYWYyWB3ti9qt1hZk5z1Yba2LtgU/RbYRu2iBkXDYH/MCzKYmzL51mOt7fNHi8cuPbVkY925a1G5u6PXlnXcTw49rq71Oc0MvVlfngrxWRztGMokRj2Nu6PULHFoWZirlx6rNn7N7LWZD/aW9p9FoNJpVzhtaOEQkDhhKqblg/V3A/w18FT8bxSdYnJXiq8CvB1VIdwGzV/ORBt8nwlrTgTs+gUSjGGvboFihsL2T0IURrNYWVDxK6oVznH//Oko/vQezDPV/uQ9594O4jUlmd+8m/bkX6f9QA+s/vwW7oJBKFXlwG3JqEDPWgHNhBHPrRtwXX6P66H2U9/RQaDbJHC9RfO9DVBIGRhXqxhqQRBwcf96u/9fWkgom+M7/1h5iFxSeBamOds6/t5uGoyXGPnI/lbSQTOym/rlzvktF31qMkgOOhwJoakBGJzAzGT9bVaXiW3XKZYxkEi9f8H2c00lkrsDMVoe5v92BWzbp+8h+Ko/vpPJIF6ljs2AKqQF/prL8xE7KdSb1e0dQ4RBGZztqJrtET2s0q4c7MTZdoyBB4Pi8UmH68RsifrxGJIwohRGLIZGwn41vegYJhTBam1G2Pww7jQnMA/2+ErChBylXcdNx1L5DqLo0RiyGVygsJJZwJ6cWiaHK5UVZonw3qmCSwjD9zHiOgxGNIIkEKp1A8gWM2TweF4uruqcHkAe3YQ6OLWSUWpgcCSw4bv+ZRedeZBGp+czNZi8qKZkMXiLmp/2dQKPRaDRvIt4waDzIPvHl4K0F/K1S6vdFpAH4e2AtMICfenIqSD35Z8Dj+KknP7JU1dJa3mxB42JZV3ZrmA+uvAXooHHNambFjE3zCochKM9Pcw1gxGK4Ob8mBZ67oDBAUMAzm8OIRvDWdeJFLMxXT+DduwEzV8aLhzFypQVXTOPeTXgRm1JzlMhYEal6lJujhGYrGIUKha4UiVeGUOkEyjaR8xN42ezFOJIAc0sfVKqo86PMPXEPkckqodE5lG1i5Ep4o76CYqSSqGSc3JYGImNlvJCJMgWj4mFNF8EAOT+Oam/Ci9hYY7NQdfCmZxausbZ/zLo6Ktt7EKWwZko8feD/0WOTRgeNazSrkBsOGldKnQG2L9E+CVz2T6x8DeZf36Ccl7GQBvZmjnFJCtzr4hYqAPPUKhvzyoeEw6hy2U+Pe4vPp9GsRpZ7bFp8cA/lBWlxPYXY1sLsvllX539eqS64NUk6hdFYj2RzGOfOo6an8QCpejA+jTEeWCgCt6b5AnqR+dMR1MEQwVOKyGtBjYzAQcyIxzHrM6hS2a/t0dsNxRJqbBI3CFSPfylIpd1Qj2TSfmxbQz3u5JRfW+MCRAOLyUXnKhbHZgTHWnL6ZL7ooVLQ1kRodA6nPo460n8jPazRaDSau5hrSot724UQmQNOLLcct5hGlt9xoEsp1bTMMmg0dy0iMg7kWf57+Vaz3OOTHps08/fXwDVuvty/2evhbpIVtLy3m7tJ3lsh65Lj+0pROF5ebeb11XhNGs2bkdV4L6/Ga9Ksbu6m3+zdJCtoeW83d5O8t1PWa8pSpdFoNBqNRqPRaDQ3glY4NBqNRqPRaDQazW1jpSgcn1puAW4Dq/GaNJo3I6vxXl6N16RZ3dxNv9m7SVbQ8t5u7iZ5b5usKyKGQ6PRaDQajUaj0axOVoqFQ6PRaDQajUaj0axCtMKh0Wg0Go1Go9FobhvLrnCIyOMickJETonIx5ZbnmtFRP5SRMZE5HBNW72IfFdE+oPXTNAuIvInwTW+JiL3L5/kGo3mWrhbxybQ45Nm9bCS7kMROScih0TkgIi8HLRd930lIh8Otu8XkQ/fQvluyX1/JflE5IHg+k8F+8otlvX3RGQ46N8DIvJkzWe/E5z3hIi8u6Z9yd+HiPSIyN6g/QsiErpRWYPjdYrI90TkqIgcEZH/PWhfcf17FVmXt3+VUsu24BewPQ304hfOPQhsWU6ZrkP2nwDuBw7XtP0H4GPB+seAfx+sPwl8CxBgN7B3ueXXi170cuXlbh6bAvn1+KSXu35ZafchcA5ovKTtuu4roB44E7xmgvXMLZLvpu/7q8kHvBRsK8G+T9xiWX8P+LdLbLsl+O7DQE/wmzCv9vsA/h742WD9vwK/epN92wbcH6wngZOBXCuuf68i67L273JbOB4CTimlziilKsDngaeWWaZrQin1A2DqkuangM8G658F3lvT/t+Vz4tAnYi03RlJNRrNDXDXjk2gxyfNquFuuA+v9756N/BdpdSUUmoa+C7w+K0Q5Bbd90vKF3yWUkq9qPynzP9ec6xbJeuVeAr4vFKqrJQ6C5zC/20s+fsILANvB764xHXfqLwXlFKvBOtzwDGggxXYv1eR9Urckf5dboWjAxiseT/E1TtlpdOilLoQrI8ALcH6artOjWa1sxrvWT0+ae42VtpvUwFPi8h+EfnloO1676s7fU23Sr6OYP3S9lvNrwcuSH857550A7I2ADNKKed2yCoi3cB9wF5WeP9eIissY/8ut8Kxagk0VJ1zWKPRrDj0+KTR3BCPKKXuB54A/rWI/ETthyv9vlrp8gF/DqwDdgAXgD9cXnEuR0QSwJeA31RKZWs/W2n9u4Ssy9q/y61wDAOdNe/XBG13K6PzrgjB61jQvtquU6NZ7azGe1aPT5q7jRX121RKDQevY8CX8V1Orve+utPXdKvkGw7WL22/ZSilRpVSrlLKAz6N3783IuskvguTdStlFREb/wH+b5RS/ytoXpH9u5Ssy92/y61w7AM2BNHuIeBnga8us0w3w1eB+YwDHwa+UtP+oSBrwW5gtsYEp9FoVh6rbWwCPT5p7j5WzH0oInERSc6vA+8CDnP999V3gHeJSCZwaXlX0Ha7uCXyBZ9lRWR34MP/oZpj3RIuiR17H37/zsv6syISFpEeYAN+gPWSv4/A0vA94P1LXPeNyibAZ4BjSqk/qvloxfXvlWRd9v69loj327ngR/KfxI+E//hyy3Mdcv8dvkmqiu+/9kv4fm3PAv3AM0B9sK0A/yW4xkPAg8stv170operL3fr2BTIrscnvayKZaXch/iZeg4Gy5F5WW7kvgL+BX5g7ingI7dQxlty319JPuBB/IfU08CfAXKLZf0fgSyv4T8Et9Vs//HgvCeoyd50pd9H8H29FFzD/wTCN9m3j+C7S70GHAiWJ1di/15F1mXtXwl21Gg0Go1Go9FoNJpbznK7VGk0Go1Go9FoNJpVjFY4NBqNRqPRaDQazW1DKxwajUaj0Wg0Go3mtqEVDo1Go9FoNBqNRnPb0AqHRqPRaDQajUajuW1ohUOj0Wg0Go1Go9HcNrTCodFoNBqNRqPRaG4b/z+aiTEWK/EX+AAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize = (15, 5))\n", + "\n", + "plt.subplot(1,3,1)\n", + "plt.imshow(targets[0].T)\n", + "plt.subplot(1,3,2)\n", + "plt.imshow(logits[0].T)\n", + "plt.subplot(1,3,3)\n", + "plt.plot(X[-2])" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import IPython.display as ipd\n", + "\n", + "ipd.Audio(X[0], rate = 16000)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [13:05<00:00, 1.15s/it, cost=0.539]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 1, training avg cost 0.607221\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZQc1XX/P3d6dm0jaUb7MhKSEJIACY8ESMLs1gJGeAlBPzthM0vC4gQ7PnJwMIY4tvGJ4zjGxvYxxMYLJo6dKEY2NtiGmLBIYhFoRSvaQPuKlpnp9/uja2a6W9MzPTNVNV013885c6aq+vW9t19Vf/vVq1u3zDmHEEKI6FPU3QEIIYTwBwm6EELEBAm6EELEBAm6EELEBAm6EELEhOLuclxdXe1qa2u7y70QQkSS5cuX73HO1bT2WrcJem1tLcuWLesu90IIEUnMbEuu1zTlIoQQMUGCLoQQMUGCLoQQMUGCLoQQMUGCLoQQMaFdQTezR8xsl5m9meN1M7NvmNl6M1thZuf4H6YQQoj2yGeE/u/A3DZenweM9/5uAb7d9bCEEEJ0lHbz0J1zz5lZbRtNFgA/dKk6vC+aWZWZDXXO7fQpxky2vAAbfp9aTpTA+26A3q3m2Hed5T+Ag9tSyxOvgGFTg/Hz5i9g1+rU8mkXw+iZ/vtIJuGlh+HYfihKwNSPQdVI//2sWQI7Xk0tT5gDI+r8tV9/LPU5Tr4Hpb3g3NugpNw/+we3was/gmQjDDoDpnzYP9vblsO636SWJ18Ngyf7Y3fZo3BoB/Sqhhm3gFnX7G36X9j0HJRWwoxbU/87y4onYM9bMGQKTFrQORvrfgvblqa+fxOv6HwsPQA/biwaDmxNW9/mbTtF0M3sFlKjeEaNGtU5b9tehue+Cnh13CsHwvSbOmerLY4dgP+5q2V99xr488f89wOw+E44eSS1vOV5uGGJ/z72rIWnPtuybkVw4Wf897Pk03Boe2p5x6vw8Z/7a//tF+Dp+1rWh78Pxlzgn/3XfgJ//FJqubSPv4L+3IMtgn5wG3zIh5PZo3vhV3/Tsj5hDvSv7ZrNpz8P25enlgefCeMv67ytX94KLgkV/Tsv6L/+DOzfBH2GSdDbIdSLos657zrn6pxzdTU1nRxVz/ok3HcA/m6DZzTpX4DpNNmd9yAMnhKcH0iNBmfeBWMuTC0H5QPgmscy14Pwc851MGI6uAB8JL39cPkDqf9++2jql5l3BmN72DkpwfXLdpOdkee2+OgqyUboNSjTfmdp+t4ku/D9aYohiOMpZvgh6NuB9HP3Ed42IYQQIeKHoC8G/tLLdjkPOBjY/LkQQoictDuHbmY/BS4Cqs1sG/B5oATAOfcwsASYD6wH3gNuCCpYIYQQuckny2VhO6874HbfIhJCCNEpon+nqHPh2A3KT8p4juWgfITkJ5A+C3q/BBl/ALaD7uOu2Hc+HddNbw30OxgPoi/oomfS1Vzr2OFzf6h/I0mEBT2sA87C8WUWzpcoNB9B+4mo/aD6xvf9KkGPIhEWdCGEEOlI0IUQIiZI0IUQIibEQNDDygoJEBd0ZkgrdkPxE4CPU+IOK/vID3NB9E3QfexTlkuXjjeX9V/kIgaCHhY6mHoEkU2N8y5iRjZ+4QfRFfSw0qpCS98KKZsmFD8hZOxE1n5AfeO3TaUtRpLoCnqY6NguQILcKdrh6oNoIkEXQoiYIEEXQoiYIEEXQoiYEH1Bj3qaX8p4juWgfITkR8W5grcdSBqn3+mG2csdNeN8iKVnEH1BFz0TZWFkoSwXIUHPk5gV5wol5S+ixbMg2PhVnEsEiARdCCFiggRdCCFiggRdCCFiQgwEXcW5OuwjND8qzpVpLirFuXyyr+JcoRMDQQ8LHUw9gqinxkU9ftEloivoKs5VwH5UnKsNwyrOJQIjuoIeJjq4CxAV5woW9UEUkaALIURMkKALIURMkKALIURMiL6gh5XmFygqztUp+62s+usqSsW5/JzzVnGuqBJhQQ8z+4TgDybVcumEjyAIQiDTCLyWi0/Hqa6JRpIIC3qY6OjuUSirCR3z0USCLoQQMUGCLoQQMSEvQTezuWa21szWm9miVl4fZWZ/MLNXzWyFmc33P1QhhBBt0a6gm1kCeAiYB0wCFprZpKxmnwOecM5NA64FvuV3oLlRca4O+wjTT+D2VZzLd/yKU8W5QiefEfoMYL1zbqNz7iTwOLAgq40D+nrL/YAd/oVYKOhg6hFENjUupGwsUdDkI+jDga1p69u8bencB3zczLYBS4A7WzNkZreY2TIzW7Z79+5OhJthrGvvLzQ/Ks7VQRdRta/iXCI4/LoouhD4d+fcCGA+8JiZnWLbOfdd51ydc66upqbGJ9choIO7AFFxrmBRH0SRfAR9OzAybX2Ety2dm4AnAJxzLwDlQLUfAQohhMiPfAR9KTDezMaYWSmpi56Ls9q8DVwKYGZnkBL0Ls6pCCGE6AjtCrpzrgG4A3gKWE0qm2Wlmd1vZld5zT4F3GxmrwM/Ba53LqSrM1HN1sh0lmM5KB8h+QmllkuA+ylStVz8RLVcokpxPo2cc0tIXexM33Zv2vIqYJa/oQkh8kdz3iLSd4qqOFfB+lFxrtxEpjiXfiCiSIQFPUx0cBceAe4TiRk65qOJBF0IIWKCBF0IIWKCBF0IIWJCDARdxbny89GGT1/9BJyCqeJc2Ua9/z7Oeas4V2SJrqCHXssl6INJtVw65iIg+03CE9VaLn79UOvCcCSJrqALIYTIQIKeDxqtFCAqzhUs6oMoIkEXQoiYIEEXQoiYEH1BV3GuTvgIyY+KcwVvu0cV5+pCKD2E6Au6EALNeQuItKCrOFfB+gnrcwSCinNl2vPFmI+2RFtEWNCFEEKkI0HPC40wCg9VWwwWH/tA/RkaEnQhhIgJMRD0GNRyyXAbUtZOKDVjVMsl01yQtVx8RLVcIksMBF0IoWlBAVEWdBXnKmA/Ec6kUXGuTHv+GPPRlmiL6Aq6EEKIDCTo+aCr9AWIinMFi7JcoogEXQghYoIEXQghYkL0BT3qxblCS8NTca7uJ4C+abbj57RGoRbnitK+7h4iLOgh13IJ3E1YtVwIKQMlolkuLQ4CMht0LZdCtKc59LCIsKCHjEYHPYSo7+eoxy+6ggRdCCFiggQ9L3TKWHioOFewKG0xikjQhRAiJuQl6GY218zWmtl6M1uUo801ZrbKzFaa2U/8DbMtIl6c65SiWWH5CSM7KOrFuVrzVyC2WowGYFLFuaJKcXsNzCwBPARcDmwDlprZYufcqrQ244HPArOcc/vNbFBQAQshhGidfEboM4D1zrmNzrmTwOPAgqw2NwMPOef2AzjndvkbZiuEXpwrcEeoOFdHXETYfpDFuQrSnubQwyIfQR8ObE1b3+ZtS2cCMMHMnjezF81sbmuGzOwWM1tmZst2797duYi7DZ3u9Qiinp4a9fhFl/DromgxMB64CFgIfM/MqrIbOee+65yrc87V1dTU+OQ6BHSVvgBRca5gUZZLFMlH0LcDI9PWR3jb0tkGLHbO1TvnNgHrSAm8EEKIkMhH0JcC481sjJmVAtcCi7Pa/Bep0TlmVk1qCmajj3HmJuq1XLqjxkqgxKyWi6/2g6zl4ieq5RJV2hV051wDcAfwFLAaeMI5t9LM7jezq7xmTwF7zWwV8Afg75xze4MKWgiRjaY1RB5piwDOuSXAkqxt96YtO+Bu7y8kVJyrYP2oOFcbZqNSnKtgjYk20J2i+aLTPREJdJz2ZCToQggREyToeaFTxsIjYjf+RA6lLUYRCboQQsSEGAh6xNP8uqVoVlh+YlCcy0/7gfaNj6PgIFIqVZwrFKIr6LGs5RKWn5CyaWS/NcPBdL9quQiiLOhCCCEykKDnjU73egRRT0+NevyiS0jQ80FX6XsY2t/KcokmEnQhhIgJ0Rd0FecqXD8qzpVtzH+7Ks4l0oi+oAsh0DSRgEgLeszSCQ0V5+qwj0AdBGQ2KsW5lLYYRfKqtijQ6V6PIer7Oerx+8vyLfv4/p82FdzXd+GMUbx/gv9PbZOgCyFiy/+8vpPfvPkO4wb17u5QMjh4rD4QuxL0vNApY+Gh4lzBEo+0Recc/SpK+O3fXthtMYRJhOfQm1Atl8L1o1oumaaC6Jug+9iH7JQu0/laLkkH1oN+oGMg6EIInUW2TtI5inpQ10RX0GNZnCsMX2H4CSuTJkj7ARoOInZlubSKRuhCCBEbNEIXoudSaPltHSXq8ftMMgnWg6ajJOhCiNiiOXRxKj1oDi46BLlPtL/9TVv0z1RH0Rx61Agsa1HFubrsJ4w+VHEub8FP0QqiOFcXbDVnLXb8/c45iqKvcnkT4Y8at1ouIWSGNPsJwYdqueQwq1ouYeLoWXPoulO0HU40NlJ/ooHe5aUZ2xuTqbk5M+NkQ5IdB47hgNqBlZxsTFJkxvIt+5k2qoqy4gQAyaRr/p4kHam5PedCPdze3H6Q0+qTbH33MO6dwxw8Vs+f1u9h3KDeDOxVigGHjjdw24+W828Lp/F/G/Yyon8Ff3prDxedXsOscdXsPnKCnQeOU9+YZPKwvqx99zBTR1ax48BxLk463jvewNadhxhWWs/eXUdoSCZZunk/Y6t70ae8mPrGJIePN3D9o0t59IbprNpxiJreZTz5xk6uPGso00ZV8c7BE7y16zBja3pTmihi58FjzDytmk0b9nJ+iP0VBCmR8ZeDx+rpl+O1+sYkW/e9x9ia1O3v2/a/R0VJgoG9y3Lae+fgcYYAh47Xc/LICVZsO8DI/pWUFSf40Utb6FNWTL/KEvqUF/P23mNcM30Eb24/RGPSMbyqgj1HTjC84jgTfP6cHaWnzaFL0HNw6Hg9fYHvPLuRc/cfxGFcu+hJ3/2UUs+6cvjft/ZwQQjlJr64ZDUPlzTy/Po9fOHrz7XZ9s6fvpqx/sLGve3aX1OW5Ccvvc3pdpx6O8rVX3u2zfY3PLo0Y/3ZdbvbbH9V0ducXwpHTzTQq91oOkOwUzivbj1A4vhBdjvjJh+Op9NsO8+UwTNrdvHhBFz1zT+xwu3sks1nS99jL8UMKYJ7/3sl//XL/u2+51+eXnfKtr4cYUU5HKtvpKJLEXWe1MCp5yh6hKdcgmXZlv0AbNx9NBR/L2xoXyxFC0dONHR3CB2mvjHJ4YCKMhUyJxqS3eY76VyPymmQoOegLJFoXg53UiQYdh483t0h+EoyyIF0QLt79c5DwRgOAD+P+e78/jjnelSWS+SmXI6dbGTljoM8t3YndwP//Ns1/Nuv/Z0KGTmggsT+jfwx9xSjb1jaKf6x+kYSDY3QkKS4yCgqMhoak5gZRQY7Dh5neFUFDY3J5ne9e+g4I/pXttg42cieIycY0b+C3YdPAFDdu4wlK3ZwU/AfJ+OrG8zXKPXJ3zl0gqHAyp0H6TfsPcqKE+w6fJyRA1J9sXnPUYb2q2DHgWPUVvfijW0HOWNoHwDKSxLsOXKCFzbsZf5ZQ9m4+ygNjUlGD+xF/2SSBKl55pHA+l2HqK7qy+7DJ6iqLGVgr1KeXbebycP6smXfewztV87mPe8xon8F5SUJKkoSvHv4OM+t282VZw1j896jFBcZw6oqaEz7FbKwir91gvTYuhJn9v53LolzRRSlTWo3Jh0nG5KYpfaLc459R09SnCiivjHJgMpSzPtJqG9M8rXfrGHL3tS+fWP7QV7etI+LTq/hZEOSZVv2k0w6Rg6o5MiJhubjvycROUG/7tGXeXnTPhI0cnd5MD627jtGbTf8qP9+zS4m2Htc/rlf+277vKJ3uam0/XZR4eFnN/CdUnjo9+tZ8vQfOm1n0S/eyFj/fPEWPpyo56cvb+UzJTD/G3/iJCWdsv2PT67OWP9lgP1fuD8PLYz9+yW4TkwKrCirp6+lpk++/ccNp7z+x7WZ11027QlnmrQQyat3zWyuma01s/VmtqiNdh8xM2dmdf6FmMncyUOA8E7jwstGt1A+Uxh+HGHsn2jaD67//bXp75SLCIt2Bd3MEsBDwDxgErDQzCa10q4P8EngJb+DTKeiNNF+IyGE6IHkM0KfAax3zm10zp0EHgcWtNLuAeArQKBX30oS3XMdt5DnPIVoQsdpzyYfdRwObE1b3+Zta8bMzgFGOufavDppZreY2TIzW7Z7d9v5xrkoSfScK9ZCCNERujzcNbMi4GvAp9pr65z7rnOuzjlXV1PTuSdeZ4/Qg5L3dLvOhfcjEtQIK9tucCO5YDM5mmwGUcGkifT5Yz8/g1/ZI7ls+oWR3r9dyXJp2lfWbLez8XQ1lp5CPoK+HRiZtj7C29ZEH2AK8Ecz2wycBywO6sJod025CFHIxOFeCdF18lHHpcB4MxtjZqXAtcDiphedcwedc9XOuVrnXC3wInCVc25ZEAE3TbmEmX0Sjh8Io4iRc2FkuYTjI4r2XUDjTL+LTsblxqKeRruC7pxrAO4AngJWA08451aa2f1mdlXQAWZTqhG6EEK0Sl43FjnnlgBLsrbdm6PtRV0PKzclxRJ0ERxRn6fVWLhnEzl1TPSkWphCCNEBIifo3SHn0R6zxZMg52U156s7RaNK9AQ9q3JaWGl+QRFeOmHbfoOwG1RKXS5/hU963/hDIEUEzJ84/Tq2LbSUgegTPUHv7gCEKEB0ViEggoJ++HjqwQbhHcBhpS2GM9Z0BH8KHJaPKNoPaqxZ2IJeyLHFi8gJemMYT5LvYQztF1AdYiFEqESuHnp3JbkYjsnD+nL7xeNIOsdv3nyHX63o2rMbu5tV98+hrDiBkapVHQQvfPYSqnuXYcC4e/yv8+430ZqTP5XBfUu59eyxfOe5jb7Y+/Sc0/n01Es4/0u/98WeCJbIjdC7c4D+5F0XMP/MoVx51jBmjavu0HsLLd1y4pA+VJYWk/CejHTG0L4AfP3Pp7b5vivPGsrNF4zhA5MG52wzO61vhvaroCRRRHGiiBm1AwD42jVnt+lj7uQhLJo3Md+PEklGDqhg85evYPOXr+i0jQeunnLKtu98vI7Pzj+DK84a2mF7t1142inbhvYrZ2i/CoZXtf2Y57+6KPO9P7/t/A77F10nciP0ZJaip1+R95OmkdqMsQMYdaQX1RWZgnxN3Uj2HjnBx88bzYbdR9l39CS1AytZueMQ08cM4Hcr3+HAsXqm1w5g6sgqEkXGq28foDhhvPr2fj4waQiPvbiFa84eAN8P5CO0+nk+M3ci/V8s5dwxAzJe/9Wds2lMOkqLi5g7ZQjlJQkak46GZJIiMxqTjuIio9i7Uzf9NQMako6y4iJ4wDh7RBUNxYMoPtKY4eMnN59LQ9JRXpJgzuQh9Corpr4x2fxotkbvtaYfv+tn1pIoMk42JKkoSXC8oZF77vtfIL3gk//7P92iv8W5Wkif5vrcFWfwj0+u5u7LJ/C1363jhlm1PPr85oz3/v38iZQVJ/j84pUUGYwf1IePnzuKJ58ugQaYPb4aNrW0/+LVU6juVcqNs8fwn8u38bHzRvPE0q3U9Clj096jXHHmUPpXlrLv6ElW7jjEniMnuPXCsdz6/rEU/Wsx5wwakKqr6n3fnvnUhRyvb6SoyCjyMs2O1zfSmHSUJIoY0KuUuy+fwMFj9TQ0Oob0K2f5PZfBP3d9X7VkuUT77CkMIi/oQVNsxoj+FdCY+bT2RJFxxyXjAXjf6JZni40fnHpu5fWzxpxi6/zTBgIw3Rup/sOVk+BkuI/LKrLWL1EliqxZSMtLEmnbUsslidbat2wsznq9uJUzkuJEUXO7XmXFnt2iU2w30RRHU0G2ytLIHa6tMnVkVUaHfuKCsXzigrEA3HVp6pj6/Acnt/re62bWZqw/fvN58G1OGUFXVZbyhQWpEfzdHzgdgDs92+mMHFDJ2SOrmtfLihNQUQJZJTbKSxLN+6OJ3mWZ+6MkUUR175YH8RYV2FlpTyByUy7JZNNSSNknIT0xPJzHtnmjz6A/kxlB7Z+J3oOeo1ycK5C+8Xuf+mhP4+rwiJ6gK8ulR5MI6QdWiCgSQUHv7giiTdR/D4sk6ELkJHKC7kJWpGb5iLoSekT9UwQ+WxSs+RCI+h4WXSF6gp61bsCEwb35uzmns/iOWQF5DfBrnp21E1ZtmqB+oDLs+u+j6Tpb1B9B51/fBLAf/dqHTo+gC5vICfplZwzmY+eOytj227+9kNsvHsdZI6p8v+vx0omDfLXXEX5007lcdHrnnr2ai6ifaGRnVjSxaN5Eptf2D8zvivs+EJhtf4j+uUU+dCVvvycQOUEvLS7iix86s/kGmD+rG5Hx+j96N1s886kLOXtkFdW9S7nEE+UHP3JWRttvLJzGk3fNBuC0ml4AzJmcecPMwD7h3BY/fczAjJHh9Nr+zB5f3XzDj1+k0iqD/vJbYHMjHzx7WIuPNG678DRGD+zlm5/s372+5SU+2Q2obwo6y6XF1sBepW20bJumSqtf+vCZVFWW8NH3jWiz/Sdmp1KHp6alZcadyCb2Xj1tOPw3DOuXmX976RmDm3/F//v2U6dgrpk+8pRtrf7q714LD4VXamDikD68siW1/Pq9H6C8NPO3duZpA9l39CRr3jncvO0frpzEA79a1bx+/4LJfPHJ1ZxoSJKL/pWd/0IVAm09gnDikD552fjBjTP4pydXs/bdw62+3re8BI6cuj1RZM03QeXiE7PH8NiLW3Lug5mnDYSGg3nFGUe++KEzue1HyzO2nT2yite3Hmj3vU2/MQtnjGLhjNRZ+ueuOIOGpCPpHGXFCSpLE+w4cKz5x/3G2WMif8x3hMiN0OPKiP6VTB7Wl9EDK+lXWZK6wQO4cdYYZo0byDf/3zk8edcFXDt9JN/62Dk8cn0dN80ew9N3X8iCqcNY88Bc/vL8Wh69YToAfctTv9XzzxzCpi/N5/aLx3XbZwuSfhUtY5IbZ43hWx87hy9cNZkf3jgj53vOHzuQiyaeOpV1+8Wp29ebHkTeRNMU38ovzAFabv5p4pHr65qXzx5ZxScvO/UGniYxqsh1F1UPYe6UIRnrY6t7ZQy8Xv77S5uX7758AkP7lVOcyD0HX1VZSnXvMgb1KadfRQkliaKMM7VhVRVUlPacPo/sCD18gp18NmBcTW/YkXnY1vQp48efOK95/ctZ00bjBvXmX6+d1rw+87TqVs84Zo4dCM/7G3N3MmfyEHgLPjvvDG4ZezGQujNx/pktNUw2f/kKahc9CaRKGyzdvI+rpw6ntLiIT146nufX7+HmC8byx7W7ueyMwan6J/WjYNUrXDpxMGyA1+69nJLy1Mi/vCTBpi/NB1Jic9dPX2Xx6zsoSRSx9J7L+Jen1zFn8hCKDB78zVoWzZvIl3+9BoA375tD+aNfCb5jInCRZNLQvqzaeQiABVOHN28f3LeMQX1bpjjvunR86sfzi0WQ+6RTpCFBF5Fk4YxR8BZUlhVTOaAyZ7u60f1ZtmU/U4b3Y8rwfs3bK0uL+dWdFwCZotLEuWMHwAbvVvi0ebf0J2Y9cPUUJg/ry+xx1ZgZ//ShM5tfa/pRXbPzEAN6laVKHSiHHoBf/PVMjp1s9MoJpCYJ1jwwt7nEw8v3XMqxk41tmRA5iL6gh5F+F+gXMTv+YB+tEKqfQFxkJSy2s/9/cvN5nGjoqDik7e827PerKOHWVioUpvP1tLOnzL7xqXMCOf4d+fZv+3bS91hqqbW6MOnrg7ITEZpiiMDZR3cTfUEXog1Ki4soLe4Jl4oKd/Tfp6wYTnoVIUWgRPxID+MgDumLYsGl+oXuJ8DiXC0+gjUfmIOg+qaA0xb7lKeyTB65brpvNkXrRFzQhRBCNCFBF0KImCBBzxddkOkhRH0/Rz1+0RViIOhhZIWEV5wrnKJZYfkJsnBU0z4JwEfG/LGP9gPpm6D62If+bfq81tWMGZf1X+QiBoIuhCjkLBcRHtEW9LCyQkIhhMyQ0PyElUkTRfs9rziXfmzCIy9BN7O5ZrbWzNab2aJWXr/bzFaZ2Qoze8bMRvsfqhBCiLZoV9DNLAE8BMwDJgELzWxSVrNXgTrn3FnAz4EH/Q5UCCFE2+QzQp8BrHfObXTOnQQeBxakN3DO/cE59563+iLQdqHiSKILMj2CqGczRT1+0SXyEfThwNa09W3etlzcBPy6tRfM7BYzW2Zmy3bv3p1/lEIIIdrF14uiZvZxoA74amuvO+e+65yrc87V1dT49Gg1FecqXD+BFY4Cf4pH5SK/4lwdJ8jiXH4eo/4W50p70nonzeisI1/yKc61HUh/zM8Ib1sGZnYZcA9woXPuhD/htYdquRSkn1hkH6mWi4/GfLQl2iKfEfpSYLyZjTGzUuBaYHF6AzObBnwHuMo5t8v/MIUQQrRHu4LunGsA7gCeAlYDTzjnVprZ/WZ2ldfsq0Bv4D/M7DUzW5zDnBBCiIDIqx66c24JsCRr271py5f5HJcQQogOEu07RcNEF2Z6CFHfz1GPX3SFGAh6WFkhQbkJq2hWO35DceqHybCydQKw73Ku+GXUJ5M+FRHz7dgOOnMqPsRA0MNAV+kLjwD3SSQf5ux3zD7ai2R/RpNoC3os0uOaHaHiXB1xEVX7Ks4lgiPagi6EEKIZCboQQsQECXre6GJMjyDqF92iHr/oEtEX9LAe2RYYMarlEkrGTkhZQYHYD7KWi5/4FadPx5xTlku+RF/Qw0BX6QuQIPdJFPe3slxE5AVdxbkK0k9QBahO8RGog4DMqjiXCI6IC7oQQogmJOhCCBETJOhCCBETJOj5oqvrPYSo7+eoxy+6QgwEPW7FucLyE9HCWac8ci1KxblCeCSfLybTHkHnR3GupgusfhTn0g9Wm0Rb0EOr5RKXGith+Yl4LZegM4FUy0UERLQFXQghRDMSdCGEiAkSdCGEiAkSdCGEiAnRF/TQinN1V7GuoOxGtXBWwD4CzQYKoTiXL3ZVnCuqRF/Qw0DFhQoQFecKFhXniiIRF3QV5+q4H1ScKz8HAZlVcS4RHBEXdCGEEE1I0IUQIiZI0IUQIibEQNBDygoJK5smND8h+AjjMXe++wgwiyIjdr/8BNAffsXp27GtWi75EgNBFz0TXWjLpID7Q1kuoRFtQVdxrgL1o+JcbRhXca5usxF/oi3oQgghmpGgCyFETMhL0FAPRQwAAAgXSURBVM1srpmtNbP1ZraoldfLzOxn3usvmVmt34EKIYRom3YF3cwSwEPAPGASsNDMJmU1uwnY75wbB/wL8BW/AxVCCNE2xXm0mQGsd85tBDCzx4EFwKq0NguA+7zlnwPfNDNzLoRKOq/8ENY95b/d+mOZ67tWw0Pn+u8n2ZC5fnhnMH5OHMlcX/0/sH2Zvz6yd/e+Tf5/lmP7vQXvItkzD8D//Zt/9g/thLLeLevfvxyK8vma5MG+jVA9PrW8/RV/+qb5OPX647/+Gkp7dc3msX0t9p59EF7+XufsNJ70QvNsfe8SKEp03I5LghWljq9vz4pH1syFn4EpH/HdbD5H6nBga9r6NiD7SGxu45xrMLODwEBgT3ojM7sFuAVg1KhRnQw5jfd/Gt55o+t2cjF6JoyYARiUlAfnZ+hUGHcpDDoDTh4hsFzb8ktSPmbdBVueD8bH4MkwcT4Mn5b6IgZBn6EwbBrMuAWOvOuv7ZrTYdRMGHcZTPkoJOv9tT3tL+DEYagc4J/d0bOg7gZ4b0/KdlepmQjn3gr9RsCh7V2zNbwOpi6EV3/c+b4cNAkmzIW3njp1ABRVyqsCMWvtDaLN7KPAXOfcJ7z1vwDOdc7dkdbmTa/NNm99g9dmT2s2Aerq6tyyZT6PEIUQIuaY2XLnXF1rr+VzUXQ7MDJtfYS3rdU2ZlYM9AP2djxUIYQQnSUfQV8KjDezMWZWClwLLM5qsxi4zlv+KPD7UObPhRBCNNPuHLo3J34H8BSQAB5xzq00s/uBZc65xcD3gcfMbD2wj5ToCyGECJG8Lt8755YAS7K23Zu2fBz4M39DE0II0RF0p6gQQsQECboQQsQECboQQsQECboQQsSEdm8sCsyx2W5gSyffXk3WXagFguLqGIUYVyHGBIqro8Q5rtHOuZrWXug2Qe8KZrYs151S3Yni6hiFGFchxgSKq6P01Lg05SKEEDFBgi6EEDEhqoL+3e4OIAeKq2MUYlyFGBMoro7SI+OK5By6EEKIU4nqCF0IIUQWEnQhhIgJkRP09h5Y7bOvkWb2BzNbZWYrzeyT3vb7zGy7mb3m/c1Pe89nvdjWmtmcoOI2s81m9obnf5m3bYCZ/c7M3vL+9/e2m5l9w/O9wszOSbNzndf+LTO7Lpe/PGM6Pa1PXjOzQ2b2N93RX2b2iJnt8h6+0rTNt/4xs/d5/b/ee29ez0XLEddXzWyN5/uXZlblba81s2Np/fZwe/5zfcZOxOTbPrNU6e2XvO0/s1QZ7s721c/SYtpsZq+F2Vfe+3LpQrcfXzjnIvNHqnzvBmAsUAq8DkwK0N9Q4BxvuQ+wjtSDsu8DPt1K+0leTGXAGC/WRBBxA5uB6qxtDwKLvOVFwFe85fnAr0k9KPI84CVv+wBgo/e/v7fc38d99Q4wujv6C3g/cA7wZhD9A7zstTXvvfO6ENcHgGJv+StpcdWmt8uy06r/XJ+xEzH5ts+AJ4BrveWHgb/qbF9lvf7PwL1h9pXXNpcudPvxFbURevMDq51zJ4GmB1YHgnNup3PuFW/5MLCa1PNTc7EAeNw5d8I5twlY78UcVtwLgB94yz8Ark7b/kOX4kWgysyGAnOA3znn9jnn9gO/A+b6FMulwAbnXFt3AwfWX86550jV5s/21+X+8V7r65x70aW+fT9Ms9XhuJxzv3XONT0s80VSTwXLSTv+c33GDsXUBh3aZ97I8hJSD4/PO6b24vLsXgP8tC0bfveVF1cuXej24ytqgt7aA6vbEljfMLNaYBrwkrfpDu/06ZG0U7Vc8QURtwN+a2bLLfXwbYDBzrmd3vI7wOBuiKuJa8n8snV3f4F//TPcW/Y7PoAbSY3ImhhjZq+a2bNmdkFavLn85/qMncGPfTYQOJD2g+VXX10AvOuceyttW+h9laUL3X58RU3QuwUz6w38J/A3zrlDwLeB04CpwE5Sp35hM9s5dw4wD7jdzN6f/qL3y94tOaneHOlVwH94mwqhvzLozv7JhZndAzQAP/Y27QRGOeemAXcDPzGzvvna6+JnLLh9lsVCMgcMofdVK7rQJXt+EDVBz+eB1b5iZiWkdtqPnXO/AHDOveuca3TOJYHvkTrdbCs+3+N2zm33/u8CfunF8K53utZ0qrkr7Lg85gGvOOfe9WLs9v7y8Kt/tpM5LdLl+MzseuBK4GOeGOBNa+z1lpeTmqOe0I7/XJ+xQ/i4z/aSmmIoztreaTxbHwZ+lhZvqH3Vmi60YS+84yufifZC+SP1yLyNpC7GNF14mRygPyM1f/X1rO1D05b/ltScIsBkMi8YbSR1scjXuIFeQJ+05f8jNff9VTIvyjzoLV9B5kWZl13LRZlNpC7I9PeWB/jQb48DN3R3f5F1oczP/uHUi1bzuxDXXGAVUJPVrgZIeMtjSX2p2/Sf6zN2Iibf9hmpM7X0i6J/3dm+SuuvZ7uxr3LpQrcfX4EIYZB/pK4YryP1C3xPwL5mkzptWgG85v3NBx4D3vC2L846+O/xYltL2pVpP+P2DtjXvb+VTfZIzVc+A7wFPJ12cBjwkOf7DaAuzdaNpC5srSdNhLsQWy9So7J+adtC7y9Sp+M7gXpSc5A3+dk/QB3wpveeb+Lddd3JuNaTmkttOsYe9tp+xNu/rwGvAB9sz3+uz9iJmHzbZ97x+rL3Of8DKOtsX3nb/x24LattKH3Vji50+/GlW/+FECImRG0OXQghRA4k6EIIERMk6EIIERMk6EIIERMk6EIIERMk6EIIERMk6EIIERP+P8DshQNq/zw3AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.539]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 2, training avg cost 0.538868\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deXgc5ZWv39NaLEveZCTvsmWDDdgsNsiGQCAE7NiGG0xIAngSJgQShxuYkCHLdS4ZQkgyScidySxhQsidZAITQiCTxXMxA4GwTBIw2BiMjTfZBizv2Ma7LUv93T+6JHW31VJLqip1lX7v8+hRVfXX55z+qvrXVd936pQ55xBCCBF9Er0dgBBCCH+QoAshREyQoAshREyQoAshREyQoAshREwo7i3HVVVVrra2trfcCyFEJFm2bNk7zrnq9l7rNUGvra1l6dKlveVeCCEiiZm9les1DbkIIURMkKALIURMkKALIURMkKALIURMkKALIURM6FTQzewnZrbTzFbmeN3M7J/MrN7MVpjZOf6HKYQQojPyOUP/N2BOB6/PBSZ6fwuAH/Y8LCGEEF2l0zx059zzZlbbQZN5wAMuVYf3RTMbYmYjnXPbfIoxk7degA1/SC0XlcC5n4QB7ebY95xlP4N9Danl066AUVOD8bPy17BzdWr55PfDuAv895FMwpL74MheSBTB1I/BkBr//axZDFuXp5YnzYYxdf7aP34k9TkaD0NpBZx3M5SU+Wd/XwMs/3dINsOw0+GMq/2z3bAM1v1XannKVTB8ij92l/4U9m+FiiqYsQDMemZv03/DpuehtBxmfCb1v7useATeWQ8jzoDJ87pnY92T0PBy6vt32hXdj6UP4MeNRaOBzWnrDd62EwTdzBaQOotn7Nix3fPW8BI8/z3Aq+NefhJMv6l7tjriyLvwn59rW9+1Bq590H8/AIv+ChoPppbf+hN8crH/Pt5ZC098pW3dEvC+L/vvZ/EXYf+W1PLW5fDxX/lr/+0X4Km72tZHnwvjL/LP/qsPwbPfTi2XDvRX0J+/p03Q9zXAh3y4mD20G/7f59vWJ82Gytqe2Xzqa7BlWWp5+JkwcWb3bf3mM+CS0L+y+4L++Jdh7yYYOEqC3gmhToo65+53ztU55+qqq7t5Vn3hbXDXu/ClDZ7RpH8BptNid+49MPyM4PxA6mzwgs/B+PelloPyAXDNg5nrQfg55xMwZjq4AHwkvf0w6xup/377aOmXC/4qGNujzkkJrl+2W+zUnNfmo6ckm6FiWKb97tLyvUn24PvTEkMQx1PM8EPQtwDp1+5jvG1CCCFCxA9BXwT8pZftcj6wL7DxcyGEEDnpdAzdzH4BXAJUmVkD8DWgBMA5dx+wGLgcqAcOA58MKlghhBC5ySfLZX4nrzvgFt8iEkII0S2if6eoc+HYDcpPyniO5aB8hOQnkD4Ler8EGX8AtoPu457Ydz4d1y1vDfQ7GA+iL+iib9LTXOvY4XN/qH8jSYQFPawDzsLxZRbOlyg0H0H7iaj9oPrG9/0qQY8iERZ0IYQQ6UjQhRAiJkjQhRAiJsRA0MPKCgkQF3RmSDt2Q/ETgI8T4g4r+8gPc0H0TdB97FOWS4+ON5f1X+QiBoIeFjqY+gSRTY3zJjEjG7/wg+gKelhpVaGlb4WUTROKnxAydiJrP6C+8dum0hYjSXQFPUx0bBcgQe4U7XD1QTSRoAshREyQoAshREyQoAshREyIvqBHPc0vZTzHclA+QvKj4lzB2w4kjdPvdMPs5a6acT7E0jeIvqCLvomyMLJQlouQoOdJzIpzhZLyF9HiWRBs/CrOJQJEgi6EEDFBgi6EEDFBgi6EEDEhBoKu4lxd9hGaHxXnyjQXleJcPtlXca7QiYGgh4UOpj5B1FPjoh6/6BHRFXQV5ypgPyrO1YFhFecSgRFdQQ8THdwFiIpzBYv6IIpI0IUQIiZI0IUQIiZI0IUQIiZEX9DDSvMLFBXn6pb9dlb9dRWl4lx+jnmrOFdUibCgh5l9QvAHk2q5dMNHEAQhkGkEXsvFp+NUc6KRJMKCHiY6uvsUympCx3w0kaALIURMkKALIURMyEvQzWyOma01s3ozW9jO62PN7BkzW25mK8zscv9DFUII0RGdCrqZFQH3AnOBycB8M5uc1eyrwCPOuWnAdcC/+B1oblScq8s+wvQTuH0V5/Idv+JUca7QyecMfQZQ75zb6JxrBB4G5mW1ccAgb3kwsNW/EAsFHUx9gsimxoWUjSUKmnwEfTSwOW29wduWzl3Ax82sAVgM/FV7hsxsgZktNbOlu3bt6ka4GcZ69v5C86PiXF10EVX7Ks4lgsOvSdH5wL8558YAlwMPmtkJtp1z9zvn6pxzddXV1T65DgEd3AWIinMFi/ogiuQj6FuAmrT1Md62dG4CHgFwzr0AlAFVfgQohBAiP/IR9JeBiWY23sxKSU16Lspq8zZwGYCZnU5K0Hs4piKEEKIrdCrozrkm4FbgCWA1qWyWVWZ2t5ld6TX7AvBpM3sN+AVwg3Mhzc5ENVsj01mO5aB8hOQnlFouAe6nSNVy8RPVcokqxfk0cs4tJjXZmb7tzrTlN4AL/Q1NCJE/GvMWkb5TVMW5CtaPinPlJjLFufQDEUUiLOhhooO78Ahwn0jM0DEfTSToQggREyToQggREyToQggRE2Ig6CrOlZ+PDnz66ifgFEwV58o26v33ccxbxbkiS3QFPfRaLkEfTKrl0jUXAdlvEZ6o1nLx64daE8ORJLqCLoQQIgMJej7obKUAUXGuYFEfRBEJuhBCxAQJuhBCxIToC7qKc3XDR0h+VJwreNt9qjhXD0LpI0Rf0IUQaMxbQKQFXcW5CtZPWJ8jEFScK9OeL8Z8tCU6IsKCLoQQIh0Jel7oDKPwULXFYPGxD9SfoSFBF0KImBADQY9BLZcMtyFl7YRSM0a1XDLNBVnLxUdUyyWyxEDQhRAaFhQQZUFXca4C9hPhTBoV58q0548xH22JjoiuoAshhMhAgp4PmqUvQFScK1iU5RJFJOhCCBETJOhCCBEToi/oUS/OFVoanopz9T4B9E2rHT+HNQq1OFeU9nXvEGFBD7mWS+BuwqrlQkgZKBHNcmlzEJDZoGu5FKI9jaGHRYQFPWR0dtBHiPp+jnr8oidI0IUQIiZI0PNCl4yFh4pzBYvSFqOIBF0IIWJCXoJuZnPMbK2Z1ZvZwhxtrjGzN8xslZk95G+YHRHx4lwnFM0Ky08Y2UFRL87Vnr8CsdVmNACTKs4VVYo7a2BmRcC9wCygAXjZzBY5595IazMR+ApwoXNur5kNCypgIYQQ7ZPPGfoMoN45t9E51wg8DMzLavNp4F7n3F4A59xOf8Nsh9CLcwXuCBXn6oqLCNsPsjhXQdrTGHpY5CPoo4HNaesN3rZ0JgGTzOxPZvaimc1pz5CZLTCzpWa2dNeuXd2LuNfQ5V6fIOrpqVGPX/QIvyZFi4GJwCXAfODHZjYku5Fz7n7nXJ1zrq66uton1yGgWfoCRMW5gkVZLlEkH0HfAtSkrY/xtqXTACxyzh13zm0C1pESeCGEECGRj6C/DEw0s/FmVgpcByzKavNbUmfnmFkVqSGYjT7GmZuo13LpjRorgRKzWi6+2g+yloufqJZLVOlU0J1zTcCtwBPAauAR59wqM7vbzK70mj0B7DazN4BngC8553YHFbQQIhsNa4g80hYBnHOLgcVZ2+5MW3bA7d5fSKg4V8H6UXGuDsxGpThXwRoTHaA7RfNFl3siEug47ctI0IUQIiZI0PNCl4yFR8Ru/IkcSluMIhJ0IYSICTEQ9Iin+fVK0ayw/MSgOJef9gPtGx/PgoNIqVRxrlCIrqDHspZLWH5CyqaR/fYMB9P9quUiiLKgCyGEyECCnje63OsTRD09Nerxix4hQc8HzdL3MbS/leUSTSToQggRE6Iv6CrOVbh+VJwr25j/dlWcS6QRfUEXQqBhIgGRFvSYpRMaKs7VZR+BOgjIbFSKcyltMYrkVW1RoMu9PkPU93PU4/eX1xv2cd9zG0gW2Pd3/oyxXDzJ/6e2SdCFELFl8cptPPb6NiYNH9DboWSw78jxQOxK0PNCl4yFh4pzBUs80haTzlFanODJv35fr8UQJhEeQ29BtVwK149quWSaCqJvgu5jH7JTekz3a7k4B4k+9PscA0EXQugqsn2SSUeiD11xRVfQY1mcKwxfYfgJK5MmSPsBGg4idmW5tEvSQZEEXQghok/SuT41JSJBFyKdAktv6zJRj99nks6R6EOD6BJ0IURsSTqNoYts+tABER2C3Cfa3/6mLfpnqqskHRL0SBFY1qKKc/XYTxh9qOJc3oKfohVEca4e2GrNWuxO2qJT2mI0iFstlxAyQ1r9hOBDtVxymFUtlzBp7mNpi7pTtBOONTdz/FgTWIIfPL6GGy+sZfPewzQ1Oyr6FXPaiIEcONrEwLJiiosSHDzWxPodB5haM4QXNuzm/AknseXdIzQnHbVVFSSTDkfqzKEoYazbsZ9TQ/w8K7fs4+TjSR7+0ya+/txj3PsX53DLQ69w22UTeWTpZq6aNprhA/txVs0Qxg0t56P3vcC3rz6TL//HCi44+STMjHXbD3D7rEnUVlWwZNNufrN8K5+fOZF7/1DPj5KOI0ebeGvrPvo3HeKyhY9x1wcn8+9L3mb2lOG8sGE308cPpaaynLPHDGFIeQmfeXAZ3776TD738HKumz6W+p0HGTu0nIsmVTF8UBm/Xb6F4oRx5ujB/PnJNXwxxP4KAkebxDnnUje/dPM0MukcCVK3kg8GVm/fz8jKRha9tpX5M8ZSUtR2zrb3UCOHjzdz9HgzVRX92Hu4kaqB/RjQr00Gtu87SrWDXfuOMgL40fMb6HdwE//nyXXceukpTK+tpKQowdih5Xzl16/znpNP4olV2xk7tIKpNYP54Nmj2HOokc17jrB561au6W4n+USyj91YJEHPwf6jxxkE/Oi5jZy3dx8O475NG7jvuQ2++inlOOvK4OdL3uZjY3013S7fWrya+0qaW9dveegVAP7x6fUA/PDZEz/ftfe/CMBbuw+3bvuL/7sko83z63YB0NQvyc+XvM2pdoxKS9WruOs/3wCgfudBAF55+90TfMy7908AfPe/1rRu+/5T605od2XiCJTC3sONVHb4SbtLsEM4yze/S9HRfexyxk0LH+uxvZNtC0/3g6fX7OTqIvhfv1rBCncIgDt/t6pbNp8rPcxuihmRgNXbDvBbb/995/E1J7R9fOV2AP7Ebn7xEvyv/3i99bVBHOSaMth/tIlB3Yqk56TSFvuOokd4yCVYXn5zLwAbdx0Kxd+WvUdC8RMXtrwbvf463pzkQEBFmQqZ3qx06BwU9aFTdAl6DvoVF7UuuxhkPWzbd7S3Q/CVQDUioN29etuBYAwHgJ/HfG9+f5J9bFI0ckMuRxqbWbV1H8+v3cbtwN89uYZ/frznl64nVZSy+1Bj63qtbePZfj022ymWdol/uLGZkqZmjh49zrGmJCdVlOJcavinvLSY/UePM7CsOOPHBlITP8ebkzQlHeUlRTQlHQePNQFQWpygOGH8bnkDNwf/cTK+usF8j1L9tWP/Mc4EVm/bT9PgfUwcPoCihPHc2l0MG9SPtdsPcMVZI+lfUsTx5lT/tPTjCxt2M722kuakY0h5KbsOHmOA17+l+1NjyG++c4gJwPLNe5g4qoqK0iLWbD/Axl2HONTYxPnjT6JmaH+SDg4ea+LQsSaKEsaza3dy0cRqks5RNaAfW949wtDyUnYeOEZzMpnWN4V7A1B6bD2JM3v/O5fErIh3DzcyuH9J6rb8hOGcI+ngcGMT/UuKMEtta0o6ykqKcCQx4FhTM5/61yXsOdTIqq37ee8pVUyoruDhlzbT2NzWt7e8/2Q2vXOI7fuOtju8F2ciJ+if+OlLvLRpD0U0c3uZf3bTxby3eGbtTibZYWbd9aTvts9P7OLmUt/N9hqPLmtgZin88x/Ws/ipP7bb5ku/WtFlu18r3sbVRcf51bIGvlwC1/7oRRop6Wm4APwmwP4v3J+HNib878W4bgwKrOjXxCDv1+G/17/Tuv2P9e/wx/p3Tmh/7zP+znNFibx618zmmNlaM6s3s4UdtPuwmTkzq/MvxEzmTBkBhHcZF142uoXymcLw4whj/0TTfnD9769Nf4dcRFh0KuhmVgTcC8wFJgPzzWxyO+0GArcBS7Jf85P+pUWdNxJCiD5IPmfoM4B659xG51wj8DAwr5123wC+CwQ6+5aeVxsmhTzmKUQLOk77Nvmo42hgc9p6g7etFTM7B6hxznU4O2lmC8xsqZkt3bVrV5eDBSgp6kNT1kII0QV6fLprZgng74EvdNbWOXe/c67OOVdXXd29J15nn6EHJe/pdp0L70ckqDOsbLvBncn5kyGRixabQVQwaSF9/NjPz+BX9kgum35hpPdvT7JcWvaVtdrtbjw9jaWvkI+gbwFq0tbHeNtaGAicATxrZm8C5wOLgpoY7a0hFyEKmTjcKyF6Tj7q+DIw0czGm1kpcB2wqOVF59w+51yVc67WOVcLvAhc6ZxbGkTALUMuYWafhOMHwihi5FwYWS7h+IiifRfQeabfN1rF5caivkangu6cawJuBZ4AVgOPOOdWmdndZnZl0AFmU6ozdCGEaJe8bixyzi0GFmdtuzNH20t6HlZuSool6CI4oj5Oq3Phvk3k1LEvFdoRQoiuEDlB7w05j/Y5WzwJclxWY766UzSqRE/Qs2obh5XmFxThpRN27DcIu0Gl1OXyV/ik940/BFJEwPyJ069j20JLGYg+0RP03g5AiAJEVxUCIijoB46mysKGdwCHlbYYzrmmI/hL4LB8RNF+UOeahS3ohRxbvIicoDf34tNPhBCikImcoPdWkku0xmr954Nnjwrcx6jBPha47yZR389hxj9qcBkv3zEzNH+icyIn6FE4Qf/pDdP55YLzOXP04N4O5QROHjYg52sj2xHUVV+fzcqvz+ab885o9z2nj2x7/O/ffuhMAMZXVeT0UVZy4iFX/625rPz6bB64aUa77ykEoe8NHvrUefz6sxd0+/2fu/SUjPWrpo6i/ltz+c7VZ3bL3rfT3nfRxCr+/JXLqB6Y+7Fe86YGfxIgMoncE4uyHzibPiPvJy1nOhOHDaA2MQCaGplcNIiS4gTJpOMv3zOOyvJSptcO5ferdzB55CC+tmglfz1rEhecXAXAoze/h5Vb9rF6234WvbaVb151Jq+8vZc5U0bQsPcIjy7bzN98YFyq6HDAtHyeq88ZQ+WLpcw/vYYPfeADlBYlOHY8SVlpgpJEAjM41pSkOGEcOd5MRb+2Q+TN71zB0ePNNCdda02dooRRlDCOHm9OPS7sCWPuGSNp3r4HO9DMq5+YRUW/YvYebmRQWQklRQkMONrUTMIs5asowYCiBKcMG8jGv72cY01JDjc2Mbh/CceakpSXFtGcTD2SbNEDb8Dm9IJP/u//dIv+FufKbfesMYNZ0bCvdf2GC2q54JQqnHMMrShlj/dErS/POZV7/mstQ8pLuH3WJGaUV8FvYGrNYNgG1QP7MaG0glOGDeC2mZMYNaQ/DXuPcO30GkYN6U9Rwrh2eg2HG5vZfegYNZXlDCgrZkj/UgaUFTO1ZgiHvvMFKiuHwDb40uxJ3DrlEspKEowc3J9Zk4czuH9JRk2l1XfP4eCxJhKWuvGvyCx13Bx6B9b0fF+1ZblE4Gyul4m8oAfNmMr+jKIMmotYfONF7bb5yLljAHj05syzqbKSIupqh1JXO5Tr31MLwKkjBgJQWVHKmWMGQ+Oh4ILPgZEqoVBaVtIaZzot6wPbKbOQ3TZ7e4toFScMDIaUp567Nmxg5ll2eWlxu/YSCaN/aVHrg0yKvRiKi4zimDzbZGrNECgbwpvXX9FpWzPjlb+ZlbHts5eknXnvOAzAqSMGwTa4//o6qJne+vJ1M8a2a/PG947P6bOiXzF4+2fEoDJIu+KqGnDiGXn6/hK9S+SGXNqesxvOYPqg8nAexBnOY9s8LGA/ZgT2CDeXWZI1KIIszhVI3/i9T320p/Pq8IieoEdhEF0ERlK7X4icRFDQezsC0Zvo91yI3ERO0F1vfaOlJAWBC/gCPvq3wOg47ctET9Cz1oP+AqZ0PEAv2Vk7AddYaa1WGdQPVIbdAHy0jqGniOoj6Pzrm+D6uMf2s+Y79Ai64ImcoM88fTgfO+/Emft8uLauhm99qP186r5Cv4jXk2/JjhHZRP/aQvScyH27S4sTfOtDZ/IP107N2eaDZ49izpQRGdu+edUZfPcjZ/Gx88bx1StO5zQvfRDgq1eczuD+JXz/2rNbty2ccyoA54yr9PkTtM/08SeFlOUSXAZKho+AMmmuOGtkm48ACa6WS0B9U9BZLsHuq5/cUMfM04cF6iMqRPZ056ppo+F3cNtlE7nt/Z3n86bzqYsm8KmLJpywDeBD01I55exaC8/A0Ipw0hZPGzGQV97Kv/3V54zm76+Zyu9e3cJtD7+a8dodl5/Otxavbvd9Hd3FGQXaewRhzdD+bN5zJDCf/UuKOHK8ucM2CdOEfdAkzHjgxhmMqezPhOrMO54vPW04G3YdpMiMAWXFJMxab8jqS9cukRX0uDGmspxjowYxrrGcN29L/UD9y7P1LNm4h5/eMJ2///06nlq9gzXbD/DqnbNab9iZN3U0Tc2OiyZV8f3fr2fU4DI+ffEEPn3xBDbvOczN/76MoRWlvL9kGGyM35DFP82fBmdeyrwf/JHX0u60zIf7Pn4ui1/fxujK/vzw2Q3UnlQOWSZW3z0HSiuoXfgYl542jD+s2dmurX7FKdEfVFbM/qNN3HBBLQ8teZt5U0fx6LKG7n68WPHa12Yx+evPdPv9ZnDxpOqcr59cfWJZi7BOyAqFeH27AyX47IpTqgfA1rbzic9ecgqfvSS1/MXZp/LF2ae2+94Pe3eqfjurRkfN0HIe+5x3d+uGw7DR76h7j9lTRsD6tvWf3TiDdTsOcu64Sv7x6fUcaWyisqKUa+pqqPvmUwD8csH5bHn3CKeNGET9roPMOWMEc84YQTLp+MzFExhSXsqDX/0BAJedNhw2tNl/+Y6ZDOpfzH+ve4eX39rDj57byEfOHcOi17bS2JTkt7dcyFOrd3DVtNGs3rqfmZOHc9eVUwD43kfPZkXDu3B/CB1TwNlYLRPyk0cO4qJJVdx44XieWbOThb9+nbKSBHdcMZm/+e3KjPd846ozKH7SINmeRZGNBF1EkvkzxmYI+pDyUmaMHwrA7bMmZbS9auoompKO8yac1Lpt8qi2omKJhLVe8Xy0robStcs5b8LQDEFvKUI1c/JwZk4ezlfmng7Apy+awPPrdnHqiIGtZR1GD+l/QrxnjRnCCmBQWd/9yvUrKuJ3t1zIhOoKBnplJ2ZPGcHCX7/Owjmncf3547j+/HHULnwMgD984X2Mr6rAnk5I0PMk+kdXGOl3gd4qnx1/sI9WCNVPIC6yEhbz2P//cN20vK2XFScy93cn9tOFvDNOqa6gZPAQaHzXv+M2kOPf0ZX+7dhO+mHgOLtmSEaLyopS3vxO5hzY+ROGsnP/sbZx8pYYCvjqo1CIvqALERHKS4ugnUldfyjcqb+B/YqhMf/2Dy94T3DBxJzIpS1mElaaXxhugkv1C91PgMW52nwEaz4wB0H1TQGnLQ4s61sTk71JxAVdCCFECxJ0IYSICRL0fNGETB8h6vs56vGLnhADQQ8jKyS84lzhFM0Ky0+QhaNa9kkAPjLGj320H0jfBNXHPvRvy+e1nmbMuKz/IhcxEHQhRCFnuYjwiLagh5UVEgphFM0Ky09YmTRRtN/3inPpxyY88hJ0M5tjZmvNrN7MFrbz+u1m9oaZrTCzp81snP+hCiGE6IhOBd3MioB7gbnAZGC+mU3OarYcqHPOnQX8CrjH70CFEEJ0TD5n6DOAeufcRudcI/AwMC+9gXPuGefcYW/1RWCMv2EWApqQ6RNEPZsp6vGLHpGPoI8GNqetN3jbcnET8Hh7L5jZAjNbamZLd+3alX+UQgghOsXXSVEz+zhQB3yvvdedc/c75+qcc3XV1bnrGncJFecqXD+BFY4Cf4pH5SL/4lxdI4C+OSGN0xej+NO/LWmLWetdNqOrjnzJpzjXFqAmbX2Mty0DM5sJ3AG8zzl3zJ/wOkO1XArSTyyyj1TLxUdjPtoSHZHPGfrLwEQzG29mpcB1wKL0BmY2DfgRcKVzrv1HugghhAiUTgXdOdcE3Ao8AawGHnHOrTKzu83sSq/Z94ABwKNm9qqZLcphTgghREDkVQ/dObcYWJy17c605Zk+xyWEEKKLRPtO0TDRxEwfIer7Oerxi54QA0EPKyskKDdhFc3qxG8oTv0wGVa2TgD2Xc4Vv4z6ZNKnImK+HdtBZ07FhxgIehholr7wCHCfhFa/x0/8jtlHe5Hsz2gSbUGPRXpcqyNUnKsrLqJqX8W5RHBEW9CFEEK0IkEXQoiYIEHPG03G9AmiPukW9fhFj4i+oIf1yLbAiFEtl1AydkLKCgrEfpC1XPzErzh9OuacslzyJfqCHgaapS9AgtwnUdzfynIRkRd0FecqSD9BFaA6wUegDgIyq+JcIjgiLuhCCCFakKALIURMkKALIURMkKDni2bX+whR389Rj1/0hBgIetyKc4XlJ6KFs0545FqUinOF8Eg+X0ymPYLOj+JcLROsfhTn0g9Wh0Rb0EOr5RKXGith+Yl4LZegM4FUy0UERLQFXQghRCsSdCGEiAkSdCGEiAkSdCGEiAnRF/TQinP1VrGuoOxGtXBWwD4CzQYKoTiXL3ZVnCuqRF/Qw0DFhQoQFecKFhXniiIRF3QV5+q6H1ScKz8HAZlVcS4RHBEXdCGEEC1I0IUQIiZI0IUQIibEQNBDygoJK5smND8h+AjjMXe++wgwiyIjdr/8BNAffsXp27GtWi75EgNBF30TTbRlUsD9oSyX0Ii2oKs4V4H6UXGuDoyrOFev2Yg/0RZ0IYQQrUjQhRAiJuQl6GY2x8zWmlm9mS1s5/V+ZvZL7/UlZlbrd6BCCCE6plNBN7Mi4F5gLjAZmG9mk7Oa3QTsdc6dAnwf+K7fgRcvI18AAAfZSURBVAohhOiY4jzazADqnXMbAczsYWAe8EZam3nAXd7yr4AfmJk5F0IlnVcegHVP+G/3+JHM9Z2r4d7z/PeTbMpcP7AtGD/HDmaur/5P2LLUXx/Zu3vPJv8/y5G93oI3Sfb0N+DP/+yf/f3boN+AtvV/nQWJfL4mebBnI1RNTC1vecWfvmk9Tr3++O1nobSiZzaP7Gmz99w98NKPu2enudELzbP140shUdR1Oy4JlkgdXz+8MB5ZM+/7MpzxYd/N5nOkjgY2p603ANlHYmsb51yTme0DTgLeSW9kZguABQBjx47tZshpXPxF2P56z+3kYtwFMGYGYFBSFpyfkVPhlMtg2OnQeJDAcm3LLk35uPBz8NafgvExfAqcdjmMnpb6IgbBwJEwahrMWAAHd/hru/pUGHsBnDITzvgIJI/7a3va9XDsAJQP9c/uuAuh7pNw+J2U7Z5SfRqc9xkYPAb2b+mZrdF1MHU+LP959/ty2GSYNAfWP3HiCVBUKRsSiFnr7CTazD4CzHHOfcpbvx44zzl3a1qblV6bBm99g9fmnfZsAtTV1bmlS30+QxRCiJhjZsucc3XtvZbPpOgWoCZtfYy3rd02ZlYMDAZ2dz1UIYQQ3SUfQX8ZmGhm482sFLgOWJTVZhHwCW/5I8AfQhk/F0II0UqnY+jemPitwBNAEfAT59wqM7sbWOqcWwT8K/CgmdUDe0iJvhBCiBDJa/reObcYWJy17c605aPAR/0NTQghRFfQnaJCCBETJOhCCBETJOhCCBETJOhCCBETOr2xKDDHZruAt7r59iqy7kItEBRX1yjEuAoxJlBcXSXOcY1zzlW390KvCXpPMLOlue6U6k0UV9coxLgKMSZQXF2lr8alIRchhIgJEnQhhIgJURX0+3s7gBworq5RiHEVYkyguLpKn4wrkmPoQgghTiSqZ+hCCCGykKALIURMiJygd/bAap991ZjZM2b2hpmtMrPbvO13mdkWM3vV+7s87T1f8WJba2azg4rbzN40s9c9/0u9bUPN7Pdmtt77X+ltNzP7J8/3CjM7J83OJ7z2683sE7n85RnTqWl98qqZ7Tezz/dGf5nZT8xsp/fwlZZtvvWPmZ3r9X+99968nouWI67vmdkaz/dvzGyIt73WzI6k9dt9nfnP9Rm7EZNv+8xSpbeXeNt/aaky3N3tq1+mxfSmmb0aZl9578ulC71+fOGci8wfqfK9G4AJQCnwGjA5QH8jgXO85YHAOlIPyr4L+GI77Sd7MfUDxnuxFgURN/AmUJW17R5gobe8EPiut3w58DipB0WeDyzxtg8FNnr/K73lSh/31XZgXG/0F3AxcA6wMoj+AV7y2pr33rk9iOsDQLG3/N20uGrT22XZadd/rs/YjZh822fAI8B13vJ9wP/sbl9lvf53wJ1h9pXXNpcu9PrxFbUz9NYHVjvnGoGWB1YHgnNum3PuFW/5ALCa1PNTczEPeNg5d8w5twmo92IOK+55wM+85Z8BV6Vtf8CleBEYYmYjgdnA751ze5xze4HfA3N8iuUyYINzrqO7gQPrL+fc86Rq82f763H/eK8Ncs696FLfvgfSbHU5Lufck865lodlvkjqqWA56cR/rs/YpZg6oEv7zDuzvJTUw+PzjqmzuDy71wC/6MiG333lxZVLF3r9+IqaoLf3wOqOBNY3zKwWmAYs8Tbd6l0+/STtUi1XfEHE7YAnzWyZpR6+DTDcObfNW94ODO+FuFq4jswvW2/3F/jXP6O9Zb/jA7iR1BlZC+PNbLmZPWdmF6XFm8t/rs/YHfzYZycB76b9YPnVVxcBO5xz69O2hd5XWbrQ68dX1AS9VzCzAcB/AJ93zu0HfgicDEwFtpG69Aub9zrnzgHmAreY2cXpL3q/7L2Sk+qNkV4JPOptKoT+yqA3+ycXZnYH0AT83Nu0DRjrnJsG3A48ZGaD8rXXw89YcPssi/lknjCE3lft6EKP7PlB1AQ9nwdW+4qZlZDaaT93zv0awDm3wznX7JxLAj8mdbnZUXy+x+2c2+L93wn8xothh3e51nKpuTPsuDzmAq8453Z4MfZ6f3n41T9byBwW6XF8ZnYD8D+Aj3ligDessdtbXkZqjHpSJ/5zfcYu4eM+201qiKE4a3u38WxdDfwyLd5Q+6o9XejAXnjHVz4D7YXyR+qReRtJTca0TLxMCdCfkRq/+oes7SPTlv+a1JgiwBQyJ4w2kpos8jVuoAIYmLb8Z1Jj398jc1LmHm/5CjInZV5ybZMym0hNyFR6y0N96LeHgU/2dn+RNVHmZ/9w4qTV5T2Iaw7wBlCd1a4aKPKWJ5D6UnfoP9dn7EZMvu0zUldq6ZOin+1uX6X113O92Fe5dKHXj69AhDDIP1IzxutI/QLfEbCv95K6bFoBvOr9XQ48CLzubV+UdfDf4cW2lrSZaT/j9g7Y17y/VS32SI1XPg2sB55KOzgMuNfz/TpQl2brRlITW/WkiXAPYqsgdVY2OG1b6P1F6nJ8G3Cc1BjkTX72D1AHrPTe8wO8u667GVc9qbHUlmPsPq/th739+yrwCvDBzvzn+ozdiMm3feYdry95n/NRoF93+8rb/m/AzVltQ+mrTnSh148v3fovhBAxIWpj6EIIIXIgQRdCiJggQRdCiJggQRdCiJggQRdCiJggQRdCiJggQRdCiJjw/wGmZMHdkFMwngAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.54] \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 3, training avg cost 0.538869\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deXwdZfX/3yd7szZp0jVt05XSjS5pgRZooUW6KMhOfyKIyCJUBEQFQeQLqCxf8at+UcSXqKCyKlK0UGT3p7K0ZS1toatNKbSlG92SJnm+f8wkuffm3uQmd2Zyn8l5v155Zea5c88595nnfu7MM2fOiDEGRVEUxX4yujoARVEUxRtU0BVFUUKCCrqiKEpIUEFXFEUJCSroiqIoISGrqxyXl5ebqqqqrnKvKIpiJcuWLdtujKmI91qXCXpVVRVLly7tKveKoihWIiIbE72mUy6KoighQQVdURQlJKigK4qihAQVdEVRlJCggq4oihIS2hV0EblPRLaKyLsJXhcR+amIrBGRt0VkkvdhKoqiKO2RzBH6b4E5bbw+Fxjh/l0M/CL1sBRFUZSO0m4eujHmZRGpamOTU4D7jVOH9xUR6Ski/YwxWzyKMZqN/4a1zzvLmdkw+QIojJtjnzrLfge7a5zlUfOh/wR//Lz7Z9i60lkedjwMnua9j8ZGePUeOLATMjJhwheg50Dv/axaDB++4SyPPAkqq721f+iA8znq9kNOARx5KWTneWd/dw288XtobIDeh8PY07yzXbMM3n/aWR7zeegzxhu7S38Dez6EgnKYejGIpGZv/T9g/cuQkw9TL3H+d5a3H4HtH0DfsTD6lM7ZeP8ZqHnd+f6Nmt/5WLoBXtxYNADYFLFe47a1EnQRuRjnKJ5BgwZ1zlvNa/DynYBbxz2/F0y5sHO22uLALnjyipb1bavg7Ae89wOw6GtQt9dZ3vhPuGCx9z62r4Yl17WsSwbM+Jb3fhZfA3s2O8sfvgHnPuat/f/8G569qWV9wGQYcqx39t/8I7z4Q2c5p8hbQX/5jhZB310Dp3pwMrvvE/jrlS3rI0+C0qrUbD77Pdi8zFnuMw5GzO68rccvAdMIPUo7L+hPfQt2roei/iro7RDoRVFjzL3GmGpjTHVFRSePqqd/HW7aBd9c6xpt9C7ASJrszr0D+oz1zw84R4PTroAhM5xlv3wAnPVA9LoffiadD5VTwPjgo9HdDyfe4vz32kdTv0z7mj+2+09yBNcr2012Bh7Z4iNVGhugoHe0/c7S9L1pTOH70xSDH+MpZHgh6JuByHP3SrdNURRFCRAvBH0RcJ6b7XIUsNu3+XNFURQlIe3OoYvIg8BMoFxEaoDvAdkAxph7gMXAPGANsB+4wK9gFUVRlMQkk+WyoJ3XDXC5ZxEpiqIoncL+O0WNCcauX34c4wmW/fIRkB9f+szv/eJn/D7Y9ruPU7FvPBrXTW/19TsYDuwXdKV7kmqudejwuD+0f63EYkEPasBJML5EgvkSBebDbz+W2verbzzfryroNmKxoCuKoiiRqKAriqKEBBV0RVGUkBACQQ8qK8RHjN+ZIXHsBuLHBx+t4g4q+8gLc370jd997FGWS0rjzcT8VxIRAkEPCh1M3QJrU+Pci5jWxq94gb2CHlRaVWDpWwFl0wTiJ4CMHWvt+9Q3XtvUtEUrsVfQg0THdhri507RHa59YCcq6IqiKCFBBV1RFCUkqKAriqKEBPsF3fY0P8d4gmW/fATkR4tz+W/blzROr9MNY5c7asZ4EEv3wH5BV7onmoURg2a5KCroSRKy4lyBpPxZWjwL/I1fi3MpPqKCriiKEhJU0BVFUUKCCrqiKEpICIGga3GuDvsIzI8W54o2Z0txLo/sa3GuwAmBoAeFDqZuge2pcbbHr6SEvYKuxbnS2I8W52rDsBbnUnzDXkEPEh3caYgW5/IX7QMbUUFXFEUJCSroiqIoIUEFXVEUJSTYL+hBpfn5ihbn6pT9OKveurKpOJeXc95anMtWLBb0ILNP8H8waS2XTvjwAz8EMgLfa7l4NE71mqiVWCzoQaKju1uhWU3omLcTFXRFUZSQoIKuKIoSEpISdBGZIyKrRWSNiFwb5/VBIvKCiLwhIm+LyDzvQ1UURVHaol1BF5FM4G5gLjAaWCAio2M2uwF4xBgzETgH+LnXgSZGi3N12EeQfny3r8W5PMerOLU4V+Akc4Q+FVhjjFlnjKkDHgJOidnGAMXucgnwoXchpgs6mLoF1qbGBZSNpaQ1yQj6AGBTxHqN2xbJTcC5IlIDLAa+Fs+QiFwsIktFZOm2bds6EW6UsdTen25+tDhXB13Yal+Lcyn+4dVF0QXAb40xlcA84AERaWXbGHOvMabaGFNdUVHhkesA0MGdhmhxLn/RPrCRZAR9MzAwYr3SbYvkQuARAGPMv4E8oNyLABVFUZTkSEbQXwdGiMgQEcnBuei5KGab/wCzAETkcBxBT3FORVEURekI7Qq6MaYeWAgsAVbiZLOsEJGbReRkd7NvABeJyFvAg8CXjAno6oyt2RrRzhIs++UjID+B1HLxcT9ZVcvFS7SWi61kJbORMWYxzsXOyLYbI5bfA6Z7G5qiKMmjc96K1XeKanGutPWjxbkSY01xLv2BsBGLBT1IdHCnHz7uExUzdMzbiQq6oihKSFBBVxRFCQkq6IqiKCEhBIKuxbmS89GGT0/9+JyCqcW5Yo26/z2c89biXNZir6AHXsvF78GktVw65sIn+03CY2stF69+qPXCsJXYK+iKoihKFCroyaBHK2mIFufyF+0DG1FBVxRFCQkq6IqiKCHBfkHX4lyd8BGQHy3O5b/tblWcK4VQugn2C7qiKOictwJWC7oW50pbP0F9Dl/Q4lzR9jwx5qEtpS0sFnRFURQlEhX0pNAjjPRDqy36i4d9oP0ZGCroiqIoISEEgh6CWi5RbgPK2gmkZozWcok252ctFw/RWi7WEgJBVxRFpwUVsFnQtThXGvuxOJNGi3NF2/PGmIe2lLawV9AVRVGUKFTQk0Gv0qchWpzLXzTLxUZU0BVFUUKCCrqiKEpIsF/QbS/OFVganhbn6np86JtmO15Oa6RrcS6b9nXXYLGgB1zLxXc3QdVyIaAMFEuzXFoc+GTW71ou6WhP59CDwmJBDxg9Ougm2L6fbY9fSQUVdEVRlJCggp4UesqYfmhxLn/RtEUbUUFXFEUJCUkJuojMEZHVIrJGRK5NsM1ZIvKeiKwQkT96G2ZbWF6cq1XRrKD8BJEdZHtxrnj+0sRWi1EfTGpxLlvJam8DEckE7gZOBGqA10VkkTHmvYhtRgDXAdONMTtFpLdfASuKoijxSeYIfSqwxhizzhhTBzwEnBKzzUXA3caYnQDGmK3ehhmHwItz+e4ILc7VERcW2/ezOFda2tM59KBIRtAHAJsi1mvctkhGAiNF5J8i8oqIzIlnSEQuFpGlIrJ027ZtnYu4y9DTvW6B7emptsevpIRXF0WzgBHATGAB8CsR6Rm7kTHmXmNMtTGmuqKiwiPXAaBX6dMQLc7lL5rlYiPJCPpmYGDEeqXbFkkNsMgYc8gYsx54H0fgFUVRlIBIRtBfB0aIyBARyQHOARbFbPMXnKNzRKQcZwpmnYdxJsb2Wi5dUWPFV0JWy8VT+37WcvESreViK+0KujGmHlgILAFWAo8YY1aIyM0icrK72RLgExF5D3gB+KYx5hO/glYUJRad1lCSSFsEMMYsBhbHtN0YsWyAq92/gNDiXGnrR4tztWHWluJcaWtMaQO9UzRZ9HRPsQIdp90ZFXRFUZSQoIKeFHrKmH5YduOPdWjaoo2ooCuKooSEEAi65Wl+XVI0Kyg/ISjO5aV9X/vGw6NgP1IqtThXINgr6KGs5RKUn4CyadR+PMP+dL/WclGwWdAVRVGUKFTQk0ZP97oFtqen2h6/khIq6MmgV+m7Gbq/NcvFTlTQFUVRQoL9gq7FudLXjxbnijXmvV0tzqVEYL+gK4qCThMpYLWghyydUNDiXB324asDn8zaUpxL0xZtJKlqiwp6utdtsH0/2x6/tyzbuINf///1aff1XTB1EMeN9P6pbSroiqKEliff2sLT737E8N6FXR1KFLsPHPLFrgp6UugpY/qhxbn8JTxpi0V52Txz1YwujSEoLJ5Db0JruaSvH63lEm3Kj77xu489yE5Jmc7XcjHGdPXvSaCEQNAVRdGzyPgYulfP2CvooSzOFYSvIPwElUnjp30fDfsRu2a5xMUYkG50iG6voCuKorSD6WZZPyroihJJuuW3dRTb4/cYY3TKRVEUJRQYujzJJlBU0JOhO40Ia/Bzn+j+9jZt0TtTHcU5Yek++9N+Qfcta1GLc6XsJ4g+1OJc7oKXouVHca4UbDVnLXbm/Zq2aAlhq+USQGZIs58AfGgtlwRmtZZLkHS3OXS9U7QdahsaOFRbT35uNtv2HORAXQPlRbk8/sZmzq4eSHamsPw/O7n35XVcdeJIduyto6wwh0wRRIR+JXm8tWkXU4aUkZ3p/H7uOXiIwpwsRGDPgUOUBPh53t28m+H1jXy0fR/3PfEukwaV8sx7H7H4nY946uvHUrPzABfdvxSAr50wnJ89v4bbTx/H3S+s5ewpA1m+cSdFeVl8Z97h/OOD7VSVF7Dxk30MrShk5ZY9nN7QyMGD9Wz+aA+D8hr4+ZJVVBTm8tGeWh749waeWDidTTsPcMFvXgfgS9Oq+O2/NvDD08bx5FsfMndsXx5bvplTJ/Rn2vBytn9aS1FeNu9+uJtJg0p5e+kmzgywv/ygKTfauEecjQYyxEmvM8bQaKC+sZGsjAwyMxw5cm6QSSxNu91xtOvAIQ7uPsiW3QfomZ9DVa983ty0iwwRNu86wJFDyijIzSJDhF0H6uhdlNfK1qEGwye7D9IXeG/LHj7K/ZifPLeGBVMG0r9nD/7ryRUU5mZRW9/IlbNH0NAI67fvZeZhvamtb2BIeSFv1eyiqHEv1d53X4dw0ha7OIgAUUFPwJ6DhygGfvnSOo7cuRuDcM4Pnova5rt/eTdqfcmKjzvsJ4dDvJ8HKz7cw5gABt73F6/knuwGXli9lfvrN3L/vzc2vzb3J/+I2vZnz68B4Nt/egeAO5esbn7tL29+GNf+qbmGP776Hw6Tg9Tu+ZS7a9ZGvT77rpej1n/7rw0AXPdnx8e/1n4CwFubdsW1f3LGds7MgbqGRnLa/KSdxd8pnDc27SLz4G62GeHC6xanbG+YbOa5XHhu1VZOy4Tzfv0qb5vtKdl8KecAn5BN3wy49+V1/OVF5wc+3j659PfLm5f/+5n3o14rZi9v5/m5r9rHYJBudIxu8ZSLv7y+YScA67btC8Tfc6u2BuInLNQeauzqEDrMoYZGPvWpKFM6U1vfdfuqux2hq6AnICerpWtMAL/wB+safLVfs3O/r/aDxtfjaJ9298otn/pj2Ae8HPNdqOfd7tZ/66ZcDtQ1sOLD3by8egtXAz96ZhU/e+pvnvupki28mOu52VZIhDTtr2sgzzRyqL6B2vpGinKzWLttL31LelCYm8WBugYyM4ScrAzqG1q+JRkirNu+j8rSHmRmCAcPNfDJ3jqK8rLIzc4kK0N44o3NXO7/x4n68vjzRXL6a+untRQDuw/UsX3bXqp6OXP5/Xv24OM9B9m+t5bK0nyMgcwMYef+OvqW5LF0ww4GlRVQnJdFo4GP9xykpEc2ew4eYlhFIZ/uPkgfoGbnAQYCG3fsJSunB/nZmeytrQdgy+6D9Cl2Bkd+ThY1O/dTVpDDmq17qSzNp1dhDgfqGli/fR99ivPYub+O0f2LaWhs2WeSxncwRsaWSpyx+3/3/jpKCpx+21dbz9ZPaxlclk+DMezaf4ie+dkIcLC+kSz32kFtfSPFNCJAXUMDF//mNV5cvY2e+dns2t/6bGfy4FL2HqynZ342r67f0enYbcU6QT//N6/x2vodZNLA1a2v51jNC6u3MlL2ceINT3tu+6iM7VzeVROZPnDnktX8MseZe1/cWOCZ3e9lbeG0zEM8+Np/+FY2nHjXy9SR7Yntx33s//T9eWhhwi3PYDoxKfB2bj3F4kyfvLh6G0BcMQdYtnFnSjHaTlK9KyJzRGS1iKwRkWvb2O50ETEi4tvF7Tlj+gLBTIM4foLBIIF8piD8GILYP3ba96//vbXpZYw2/NiEhXYFXUQygbuBucBoYIGIjI6zXRHwdeBVr4OMpEdOpp/mFUVRrCWZI/SpwBpjzDpjTB3wEHBKnO1uAW4HDnoYXyuacrmDJp3nPBWlCR2n3Ztk1HEAsClivcZta0ZEJgEDjTFtXp0UkYtFZKmILN22bVuHgwXIzuxO16wVRVGSJ+XDXRHJAO4CvtHetsaYe40x1caY6oqKzj3xOvYI3S95j7RrTHA/In4dYcXa9e9IzpsMiUQ02fSjgkkTkfPHXn4Gr7JHEtn0CiGyf1PJcmnaV9Jst7PxpBpLdyEZQd8MDIxYr3TbmigCxgIvisgG4ChgkV8XRrtqykVR0pmgkgSU9CYZdXwdGCEiQ0QkBzgHWNT0ojFmtzGm3BhTZYypAl4BTjbGLPUj4KYplyCzT4LxA0HcAmFMEFkuwfiw0b7x6TjT66KT3ma56I9NULQr6MaYemAhsARYCTxijFkhIjeLyMl+BxhLjh6hK4qixCWpG4uMMYuBxTFtNybYdmbqYSUmO0sFXfEP2+dp9Vi4e2OdOjaVE1UURVGisU7Qu0LO7T5mCyd+zsvqnK/eKWor9gl6TC3MoNL8/CK4dMK2/fph16+UukT+0p/IvvEGX4oIiDdxejW2JbCUAfuxT9C7OgBFSUP0rEIBCwX904NOCdPgBnBQaYvBHGsaWp8CnzZxQLxNPfXhNbba9+tYM70FPZ1jCxfWCXpDEE+S72YcPaxXyjYeu/RoDyJpm1F9i3z3oSg2Y52gd1WSS+Txc2FuFn/6auoCtuqWORzWpzBlO6kyrrKEf3zreE4Y1Tup7S+YXtWqrbqqjD9edGTC95x/9GBevGYmEwf17GyYPHyJ/z8aNs3JTx1S1qqtKf5vzxnVKZvTh8f/cf/cEf2j1o+obP1o88jxU1aQwy2njO1UDErnsU7Q0+EA/fRJA5g8uIyVN89h0cLpTB5c2vzat+eM4k9fncbr18+Oes/z35jBoLL8qLa87EyeuPyYQGJuYmhFyw9IXnYGRblZDCkvYGBZPvd9aQobbpvPlKpSKopy+d2Xp7LhtvlsuG0+3zhxJABzx/ble58bwyvXzWq203Sz17Rh5WRnZnDapMpWfq+YNYKq8gIev2w6G26bz/DeheRlZ/DgRUc1+5h9uCMIV80eyapb5vDzL0xqfv/tp4+jpEc2h/Xt+h9AL7nihOH89WudGwPfnnMYvzpvcqv20vxsLp0xlB+cOo7r5x3e3P6jM4/g1IjptRkjK7j18y2iO6WqlD985Sj6luRFJR/818ljuOusI3jlulncccZ4AH589gTW/mAe933JqfCRnSncecZ4plaVUdIjmx+eNo7Pjo/+EVD8x7onFjXGKHrkFXkvaTrSqSzrQVVOIaW5sPzcE9m+t5Yh5c4TcnrkZDK+siePXHI0H2z9lMP6FEV9ER686CjqGxuZPLiU/Jwsnr16BjU79/Pw65v46sxh7lbBZrWcOrE/pa/lcP7YwZw/f27cbR+9dFqrtoUnDOei44aSl+3Uo+9bkseG2+Zz8FADuRE3ewmGfsV5VBzWm6y9DWy4dH5cH89ePaNV2z3nTqa+0TT7mDeuH+t/OI/a+sbmtvKCHNgRWfDJ+/6LtOhtca5ouxdMr+LyE4aTm5XJhtvmY4xh9l0vccmMYZxV7ZRPenPTLj5/9z+5/PhhfPOkUTy2rIZrHn0LgIkDS8nY5hwkHDuiHNbDT86ZQK9RxyAi/L8jBwFQXVXK4F4FlBXkcPrkSq6YNYLf/HM935l3OLlZGfTvmcfyjbu47HhnTGaJMHlwGfwHfnTmeJhYBTj7/Kzqgc2xAZwwqg+vXDeLPsW5iAiPRE697dvu9mdq+6olyyUNjubSHOsF3W8yRehfkgcNh+hRkENZQevniGVmCKP6Frdqj52bzsnKYGhFIddFHDV1BZ2ZtRKRZlGNJF4b0PxMyI6QlZlBVoy5WL9h+EpPGNgT8noy43NjotpFhOe+MbPVthtua/lRPGNyJWdMbn0G1L9nDwAGlRVAbvTXeuKg0qj1IeUF3BwxHXLCqD6cMKpP1DaZkvz+61sSsmdBWox1Uy4tz9kNaDK9AwM7FYJ5bJvrw+/PJIJvj3Az0SVZ/cLP4ly+9I3X+9RDe2H4EbYF+wQ9HSbRLabR8u6zPX5F8RMLBT1Yf3nZ1nVRmxjLfxAtD19RfMU6tQpCkPIjHkRdUZjX5Nh3v0Fg+xmO8fkE3v5bYOzev0pq2CfoMeuRX8DJg0spL2x90bKjvPjNmW148ZjYrJ2gatP4JexRdn3w0TyH7mDrI+i86xv/+jhl+zHXO/QRdP5jnaDPPrwPX3DTsWI57+jBLL7i2A7Z++5nR7dq612Ux5MLpwMwZ1zfjgfpIZccNzRlG08ubMlzbmhsY0MLyM/xLzHr3KPijys7sP/cwgtumN+SQXbd3FGeHODZhHWCnpOVwfdPHcf/nD0BcG5oOG3iAC4/fhifG9+f3sV5vPadWTz19WN59NKjOXpoL9656TNsuG0+T195LBtum88RA3syfXgvXrhmJhceM4THL5vWPFf+my9NAVqEozgvO5DPNWVIr1aZFQU5mVw6Y1iCd8TnwmOGtGobV1nCje4P1+j+xfj/5fcvk2b++H4tPmL48dlHJG3nD19pfVfrrZ8fx/jKErIzM1odC/7+wsR3wcZydvXAqPWqXvmsvnUOAOVFuf70TVpnubTYipdyGcnfrzouan3VLXOi1ssKclj7g3nMPMx5yPzw3oVc85mRza/POrwl/XLeuH78+7pZvH9r/Pstwoh1eehNfH7iAHgCLps5HI6fEPVa7+I8ehc7c98PXnxUc3tTrvgTl0+P2n7ioFJW3dK1O31U3yKWb2xZP6xPEQ98ZSqlBTkMqyhg7bZ9/HTBRK548A0WLZxOaX4Ox97xAt8/dSxPv/sR//jAuYnju58dTb+SPG7920oAfvflqc32wDn7sJlEjyBcesNsygtzueph56ab208fx7f/9E7UNouvOJZ5P/0HANOHl3P9vMP5/uKVUdtMGNgT9mRBbUvbNZ8ZyTEjypkzpi9Pr/goavuyghx27Kvjxs+O5ua/vsegsnxuP2M8vYtz+dnzawAoyM0iNyuT8QNKoLAnHNiRUh/YzG2njeOxZTVxX8vNymBgxN3U88f3Iy87k8bsDKiHjAxh+XdPBOC3F0yNeu9Fxw1l655aBpblR+XtdzesFfSwUVmaT23/YgbV5vO7eVOZMbKi+bWfnDOR255axUlj+kQN1qblLxw5mFv++h7j3foaXzl2KOdMHURhbvfYvd886TDKC3Oj2k4a07dZ0P/7zCN44s3NjO5fzLIbZrO/rgGAOWP7thL0eJw3rQpoyXgaWl7Auu37AOfmsb+9vYVxlSVR+2ZkHy0kFsvSG2aT1cYzgVe7R9LXzh3FbU+t4tQJTpmCDGl/Dj43KzPqx6C70j2+8Z7gf3bF8IpC+FCixBxg7IASfh9niiCS2GsBYRfzk8b0hQ/grrOOIPOI1tNSRe5UWVFuVtTdlb0Kc2m6f3dgWT6rbpnDqO8+3fJG90LerFF9YC1cOXtk87Rbhnv361dnDuObj70NwHlHDebKWSMYESPgnxnTh5PG9GHJio85Z0r0FIyvpHEWU4nbj2MHFPPu5j3N7Y9eejR9i1vOHC86diiH9yvmuBHlgcdoO+H+1iuhZcHUQfCBUy4gcr7371cdR6NxyjF88P257ZYgyMvOZNyAEs47enBU+5FDy2Bt/IvSIsIN8w/n1r+tpLIsnwHubfeR5GZl8ssvVnfy04Wbxy6dxr7aeibf+iwAU6qiq0ZmZrQ+qFGSw35BDyL9ztdb5WPj9/fRCoH68cVFTMJizP6PPFLObuP0PpInW1U7jNjfEfYvmzmcNzftYtao3vTMz+bsKQObzwSSI7JvPOocX8a/IVH/dtxO5Kd2lvKyMxPWAIpvxngQS/fAfkFXlIAY3ruQ5yOKZ3VMzP0mfdMWi3KzoK51+7+uPSH4YEKO5YIexCAOsAhYEIXAgvDjY3GuFh/+mvfNgV99k8Zpi0V5OXEFvX+cqSolNazLQ1cURVHio4KuKIoSElTQk0UvyHQTbN/PtsevpEIIBD2IrJDginMFUzQrKD9+Fo5q2ic++IiaP/bQvi9941cfe9C/TZ9XUs2YMTH/lUSEQNAVRUnnLBclOOwW9KCyQgIhgMyQwPwElUljo32f+iaNs1z0xyY4khJ0EZkjIqtFZI2IXBvn9atF5D0ReVtEnhORwfHsKIqiKP7RrqCLSCZwNzAXGA0sEJHYIuJvANXGmPHAY8AdXgeqKIqitE0yR+hTgTXGmHXGmDrgIeCUyA2MMS8YY/a7q68AbRc9thK9INMtsD2byfb4lZRIRtAHAJsi1mvctkRcCDwV7wURuVhElorI0m3btiUfpaIoitIunl4UFZFzgWrgznivG2PuNcZUG2OqKyo8qqamxbnS149vhaPAm+JRiYhfnCt1/CzO5eUY9bY4V0tonbSlZx1Jk0wtl81AZEHnSrctChGZDVwPzDDG1Ma+7g9ayyUt/YQi+0hruXhozENbSlskc4T+OjBCRIaISA5wDrAocgMRmQj8EjjZGLPV+zAVRVGU9mhX0I0x9cBCYAmwEnjEGLNCRG4WkZPdze4ECoFHReRNEVmUwJyiKIriE0mVzzXGLAYWx7TdGLE82+O4FEVRlA5i952iQaIXZroJtu9n2+NXUiEEgh5UVohfboIqmtWO30CcemEyqGwdH+ybhCteGfXIpEdFxDwb235nToWHEAh6EOhV+vTDx30SWP0eL/E6Zg/tWdmfdmK3oIciPa7ZEVqcqyMubLWvxbkU/7Bb0BVFUZRmVNAVRVFCggp60ujFmG6B7RfdbI9fSQn7BT2oR7b5RohquQSSsRNQVpAv9v2s5eIlXsXp0ZgzmuWSLPYLeuM8QvcAAAlySURBVBDoVfo0xM99YuP+1iwXxXpB1+JcaenHrwJUrXz46sAns1qcS/EPywVdURRFaUIFXVEUJSSooCuKooQEFfRk0avr3QTb97Pt8SupEAJBD1txrqD8WFo4q9Uj12wqzhXAI/k8MRnxCDovinM1XWD1ojiX/mC1id2CHlgtl7DUWAnKj+W1XPzOBNJaLopP2C3oiqIoSjMq6IqiKCFBBV1RFCUkqKAriqKEBPsFPbDiXF1VrMsvu7YWzvLZh6/ZQAEU5/LErhbnshX7BT0ItLhQGqLFufxFi3PZiOWCrsW5Ou4HLc6VnAOfzGpxLsU/LBd0RVEUpQkVdEVRlJCggq4oihISQiDoAWWFBJVNE5ifAHwE8Zg7z334mEURFbtXfnzoD6/i9Gxsay2XZAmBoCvdE73QFk0a94dmuQSG3YKuxbnS1I8W52rDuBbn6jIb4cduQVcURVGaUUFXFEUJCUkJuojMEZHVIrJGRK6N83quiDzsvv6qiFR5HaiiKIrSNu0KuohkAncDc4HRwAIRGR2z2YXATmPMcODHwO1eB6ooiqK0TVYS20wF1hhj1gGIyEPAKcB7EducAtzkLj8G/K+IiDEBVNJZfj+8v8R7u4cORK9vXQl3H+m9n8b66PVPt/jjp3Zv9PrKJ2HzUm99xO7uHeu9/ywHdroL7kWy526Bf/3MO/t7tkBuYcv6r0+EjGS+JkmwYx2Uj3CWNy/3pm+ax6nbH3+5DHIKUrN5YEeLvZfugNd+1Tk7DXVuaK6tX50AGZkdt2MaQTKc8fWL6eHImpnxLRh7uudmkxmpA4BNEes1QOxIbN7GGFMvIruBXsD2yI1E5GLgYoBBgwZ1MuQIjrsGPnondTuJGDwNKqcCAtl5/vnpNwGGz4Leh0PdXnzLtc07wfEx/QrY+E9/fPQZA6PmwYCJzhfRD4r6Qf+JMPVi2Puxt7YrDoNB02D4bBh7BjQe8tb2xC9C7aeQX+ad3cHTofoC2L/dsZ0qFaPgyEugpBL2bE7N1oBqmLAA3vhD5/uy92gYOQc+WNL6AMhW8nr6YlbaO4gWkTOAOcaYr7jrXwSONMYsjNjmXXebGnd9rbvN9ng2Aaqrq83SpR4fISqKooQcEVlmjKmO91oyF0U3AwMj1ivdtrjbiEgWUAJ80vFQFUVRlM6SjKC/DowQkSEikgOcAyyK2WYRcL67fAbwfCDz54qiKEoz7c6hu3PiC4ElQCZwnzFmhYjcDCw1xiwCfg08ICJrgB04oq8oiqIESFKX740xi4HFMW03RiwfBM70NjRFURSlI+idooqiKCFBBV1RFCUkqKAriqKEBBV0RVGUkNDujUW+ORbZBmzs5NvLibkLNU3QuDpGOsaVjjGBxtVRwhzXYGNMRbwXukzQU0FElia6U6or0bg6RjrGlY4xgcbVUbprXDrloiiKEhJU0BVFUUKCrYJ+b1cHkACNq2OkY1zpGBNoXB2lW8Zl5Ry6oiiK0hpbj9AVRVGUGFTQFUVRQoJ1gt7eA6s99jVQRF4QkfdEZIWIfN1tv0lENovIm+7fvIj3XOfGtlpETvIrbhHZICLvuP6Xum1lIvJ3EfnA/V/qtouI/NT1/baITIqwc767/Qcicn4if0nGdFhEn7wpIntE5Mqu6C8RuU9EtroPX2lq86x/RGSy2/9r3Pcm9Vy0BHHdKSKrXN+Pi0hPt71KRA5E9Ns97flP9Bk7EZNn+0yc0tuvuu0Pi1OGu7N99XBETBtE5M0g+8p9XyJd6PLxhTHGmj+c8r1rgaFADvAWMNpHf/2ASe5yEfA+zoOybwKuibP9aDemXGCIG2umH3EDG4DymLY7gGvd5WuB293lecBTOA+KPAp41W0vA9a5/0vd5VIP99VHwOCu6C/gOGAS8K4f/QO85m4r7nvnphDXZ4Asd/n2iLiqIreLsRPXf6LP2ImYPNtnwCPAOe7yPcBXO9tXMa//CLgxyL5yt02kC10+vmw7Qm9+YLUxpg5oemC1LxhjthhjlrvLnwIrcZ6fmohTgIeMMbXGmPXAGjfmoOI+Bfidu/w74PMR7fcbh1eAniLSDzgJ+LsxZocxZifwd2COR7HMAtYaY9q6G9i3/jLGvIxTmz/WX8r9475WbIx5xTjfvvsjbHU4LmPMM8aYpodlvoLzVLCEtOM/0WfsUExt0KF95h5ZnoDz8PikY2ovLtfuWcCDbdnwuq/cuBLpQpePL9sEPd4Dq9sSWM8QkSpgIvCq27TQPX26L+JULVF8fsRtgGdEZJk4D98G6GOM2eIufwT06YK4mjiH6C9bV/cXeNc/A9xlr+MD+DLOEVkTQ0TkDRF5SUSOjYg3kf9En7EzeLHPegG7In6wvOqrY4GPjTEfRLQF3lcxutDl48s2Qe8SRKQQ+BNwpTFmD/ALYBgwAdiCc+oXNMcYYyYBc4HLReS4yBfdX/YuyUl150hPBh51m9Khv6Loyv5JhIhcD9QDf3CbtgCDjDETgauBP4pIcbL2UvyMabfPYlhA9AFD4H0VRxdSsucFtgl6Mg+s9hQRycbZaX8wxvwZwBjzsTGmwRjTCPwK53Szrfg8j9sYs9n9vxV43I3hY/d0relUc2vQcbnMBZYbYz52Y+zy/nLxqn82Ez0tknJ8IvIl4LPAF1wxwJ3W+MRdXoYzRz2yHf+JPmOH8HCffYIzxZAV095pXFunAQ9HxBtoX8XThTbsBTe+kploT5c/nEfmrcO5GNN04WWMj/4EZ/7qf2La+0UsX4UzpwgwhugLRutwLhZ5GjdQABRFLP8LZ+77TqIvytzhLs8n+qLMa6blosx6nAsype5ymQf99hBwQVf3FzEXyrzsH1pftJqXQlxzgPeAipjtKoBMd3kozpe6Tf+JPmMnYvJsn+GcqUVeFL2ss30V0V8vdWFfJdKFLh9fvgihn384V4zfx/kFvt5nX8fgnDa9Dbzp/s0DHgDecdsXxQz+693YVhNxZdrLuN0B+5b7t6LJHs585XPAB8CzEYNDgLtd3+8A1RG2voxzYWsNESKcQmwFOEdlJRFtgfcXzun4FuAQzhzkhV72D1ANvOu+539x77ruZFxrcOZSm8bYPe62p7v7901gOfC59vwn+oydiMmzfeaO19fcz/kokNvZvnLbfwtcGrNtIH3Vji50+fjSW/8VRVFCgm1z6IqiKEoCVNAVRVFCggq6oihKSFBBVxRFCQkq6IqiKCFBBV1RFCUkqKAriqKEhP8D4vS6/QsNyZ8AAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.54] \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 4, training avg cost 0.538858\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZwcdZn/30/3XDkmB8nkIJOLnCRBkhASJNzhSFCDgIvJiquIRhQERVbx+AGLurvKS9d1ZUX8qXghoLu6cYkCcopyZALhSghJSICEkIOETO7M8d0/qmamezI90zNdVdNV83m/XslUV3/7eZ76fqs/XfWtp54y5xxCCCHiT6q7AxBCCBEMEnQhhEgIEnQhhEgIEnQhhEgIEnQhhEgIJd3lePDgwW7MmDHd5V4IIWLJihUrdjjnqtp6r9sEfcyYMdTU1HSXeyGEiCVm9lqu9zTlIoQQCUGCLoQQCUGCLoQQCUGCLoQQCUGCLoQQCaFDQTezn5jZNjN7Mcf7ZmbfM7N1Zva8mc0MPkwhhBAdkc8R+h3A/HbeXwBM8P8tAX5QeFhCCCE6S4d56M65x8xsTDtNLgB+7rw6vE+a2QAzG+6c2xJQjNm89gSsf8hbTpfCCZdB3zZz7Atnxc9g9yZvefJ74Ojp4fh58b9h22pvedyZMPrk4H00NsJTt8GBXZBKw/QPwYCRwft5eRm8+ay3PPE8qJ4VrP26A952HN4PZX1gzhVQWhGc/d2b4NlfQmMDDDkWpl0UnO1NK+CVP3nLU98PQ6cGY7fmp1D7JvQZDLOXgFlh9jb8BTY8BmW9YfYnvb9d5fl7YMdaGDYNplzQNRuv3A+blnvfv8nv6XosPYAgbiwaAbyR8XqTv+4IQTezJXhH8YwaNapr3jY9DY/dAvh13HsPghMv75qt9jjwDvzh6pbX21+GD/4ieD8ASz8Dh/d6y6/9FS5bFryPHWvgvi+1vLYUnP6F4P0suw5qN3vLbz4Ll/42WPuvPwF/vqnl9YgTYOypwdlfeSc88i/eclllsIL+2LdaBH33JrgwgJPZfW/D/3625fXE82DgmMJs/vlG2LzCWx56HEw4u+u2fvdJcI3Qa2DXBf2PX4BdG6DyaAl6B0R6UdQ5d7tzbpZzblZVVRePqudeAze9A/+43jfaGFyAmTTZXfAtGDotPD/gHQ2efDWMPd1bDssHwCW/yH4dhp+ZH4HqE8GF4KPRH4dzvub9DdpHU7+c/JlwbB890xPcoGw32Rk5p8VHoTQ2QJ8h2fa7StP3prGA709TDGHsTwkjCEHfDGSeu1f764QQQkRIEIK+FPgHP9vlJGB3aPPnQgghctLhHLqZ/Ro4AxhsZpuAG4FSAOfcbcAy4HxgHbAfuCysYIUQQuQmnyyXxR2874ArA4tICCFEl4j/naLORWM3LD+e8RzLYfmIyE8ofRb2uIQZfwi2w+7jQuy7gPbrpo+G+h1MBvEXdNEzKTTXOnEE3B/q31gSY0GPaoezaHyZRfMlisxH2H5iaj+svgl8XCXocSTGgi6EECITCboQQiQECboQQiSEBAh6VFkhIeLCzgxpw24kfkLwcUTcUWUfBWEujL4Ju48DynIpaH9zrf6KXCRA0KNCO1OPILapcf5FzNjGL4IgvoIeVVpVZOlbEWXTROIngoyd2NoPqW+Ctqm0xVgSX0GPEu3bRUiYg6IBVx/EEwm6EEIkBAm6EEIkBAm6EEIkhPgLetzT/DzjOZbD8hGRHxXnCt92KGmcQacbtl7urBkXQCw9g/gLuuiZKAujFcpyERL0PElYca5IUv5iWjwLwo1fxblEiEjQhRAiIUjQhRAiIUjQhRAiISRA0FWcq9M+IvOj4lzZ5uJSnCsg+yrOFTkJEPSo0M7UI4h7alzc4xcFEV9BV3GuIvaj4lztGFZxLhEa8RX0KNHOXYSoOFe4qA/iiARdCCESggRdCCESggRdCCESQvwFPao0v1BRca4u2W/jZbCu4lScK8g5bxXniisxFvQos08If2dSLZcu+AiDMAQyg9BruQS0n+qaaCyJsaBHifbuHoWymtA+H08k6EIIkRAk6EIIkRDyEnQzm29ma8xsnZld38b7o8zsYTN71syeN7Pzgw9VCCFEe3Qo6GaWBm4FFgBTgMVmNqVVs68C9zjnZgCLgP8MOtDcqDhXp31E6Sd0+yrOFThBxaniXJGTzxH6bGCdc+5V59xh4C7gglZtHNDPX+4PvBlciMWCdqYeQWxT4yLKxhJFTT6CPgJ4I+P1Jn9dJjcBl5rZJmAZ8Jm2DJnZEjOrMbOa7du3dyHcLGOFfb7Y/Kg4VyddxNW+inOJ8Ajqouhi4A7nXDVwPvALMzvCtnPudufcLOfcrKqqqoBcR4B27iJExbnCRX0QR/IR9M3AyIzX1f66TC4H7gFwzj0BVACDgwhQCCFEfuQj6MuBCWY21szK8C56Lm3V5nVgHoCZHYsn6AXOqQghhOgMHQq6c64euAq4D1iNl83ykpndbGYL/WafBz5hZs8BvwY+6lxEV2fimq2R7SzHclg+IvITSS2XEMcpVrVcgkS1XOJKST6NnHPL8C52Zq67IWN5FTA32NCEEPmjOW8R6ztFVZyraP2oOFduYlOcSz8QcSTGgh4l2rmLjxDHRGKG9vl4IkEXQoiEIEEXQoiEIEEXQoiEkABBV3Gu/Hy04zNQPyGnYKo4V2uj/t8A57xVnCu2xFfQI6/lEvbOpFounXMRkv0m4YlrLZegfqh1YTiWxFfQhRBCZCFBzwcdrRQhKs4VLuqDOCJBF0KIhCBBF0KIhBB/QVdxri74iMiPinOFb7tHFecqIJQeQvwFXQiB5rwFxFrQVZyraP1EtR2hoOJc2fYCMRagLdEeMRZ0IYQQmUjQ80JHGMWHqi2GS4B9oP6MDAm6EEIkhAQIegJquWS5jShrJ5KaMarlkm0uzFouAaJaLrElAYIuhNC0oIA4C7qKcxWxnxhn0qg4V7a9YIwFaEu0R3wFXQghRBYS9HzQVfoiRMW5wkVZLnFEgi6EEAlBgi6EEAkh/oIe9+JckaXhqThX9xNC3zTbCXJao1iLc8VprLuHGAt6xLVcQncTVS0XIspAiWmWS4uDkMyGXculGO1pDj0qYizoEaOjgx5C3Mc57vGLQpCgCyFEQpCg54VOGYsPFecKF6UtxhEJuhBCJIS8BN3M5pvZGjNbZ2bX52hziZmtMrOXzOzOYMNsj5gX5zqiaFZUfqLIDop7ca62/BWJrRajIZhUca64UtJRAzNLA7cC5wCbgOVmttQ5tyqjzQTgS8Bc59wuMxsSVsBCCCHaJp8j9NnAOufcq865w8BdwAWt2nwCuNU5twvAObct2DDbIPLiXKE7QsW5OuMixvbDLM5VlPY0hx4V+Qj6COCNjNeb/HWZTAQmmtlfzexJM5vfliEzW2JmNWZWs3379q5F3G3odK9HEPf01LjHLwoiqIuiJcAE4AxgMfAjMxvQupFz7nbn3Czn3KyqqqqAXEeArtIXISrOFS7Kcokj+Qj6ZmBkxutqf10mm4Clzrk659wG4BU8gRdCCBER+Qj6cmCCmY01szJgEbC0VZvf4x2dY2aD8aZgXg0wztzEvZZLd9RYCZWE1XIJ1H6YtVyCRLVc4kqHgu6cqweuAu4DVgP3OOdeMrObzWyh3+w+4G0zWwU8DPyjc+7tsIIWQrRG0xoij7RFAOfcMmBZq3U3ZCw74Fr/X0SoOFfR+lFxrnbMxqU4V9EaE+2gO0XzRad7IhZoP+3JSNCFECIhSNDzQqeMxUfMbvyJHUpbjCMSdCGESAgJEPSYp/l1S9GsqPwkoDhXkPZD7ZsAj4LDSKlUca5IiK+gJ7KWS1R+Isqmkf22DIfT/arlIoizoAshhMhCgp43Ot3rEcQ9PTXu8YuCkKDng67S9zA03spyiScSdCGESAjxF3QV5ypePyrO1dpY8HZVnEtkEH9BF0KgaSIBsRb0hKUTGirO1WkfoToIyWxcinMpbTGO5FVtUaDTvR5D3Mc57vEHy4rXdvLjxzcU3dd38exRnDYx+Ke2SdCFEInlD89t4U8vvsX4IX27O5Qsdh+oC8WuBD0vdMpYfKg4V7gkJ22xsqKU+z93erfGEBUxnkNvQrVciteParlkmwqjb8Lu4wCyUwqm67VcnHPd/XsSKQkQdCGEziLbxtGzeia+gp7I4lxR+IrCT1SZNGHaD9FwGLEry6VNiu1iaNjEV9CFEKIDHA7rQXMuEnQhMon7IV3c4w8Y5zTlIoQQicDR7Uk2kSJBz4eetEfEhjDHROMdbNpicKY6i3fC0nPGM/6CHlrWoopzFewnij5UcS5/IUjRCqM4VwG2mrMWu/J5pS3GhKTVcokgM6TZTwQ+VMslh1nVcomSnjaHrjtFO+BQQwN1h+rpVV7Ki2+8w+ThlZSkUqRT3m7S0Og4XN/I39bv4MxJQ0ilWnaf5Rt3Mmv0QFa+8Q5v7DpAeUmK0YN6U9/gGNqvgqrKcu/Ghwi358XNuxlf18jm7Xv55zuWs+S0Y3h4zXbeP+NoNmzfhxlc8ctnmHp0Pz59xnjqGhpJpYwfPLKeiUO926f/Z+WbjDqqN3UNjcwcNZDzpg1j1FG9OVTXwGxg78F6Nr1VS3V5PdfcsZxPnHoMNRt3cubkIezcd5gG57jsp8uZOWoAV501nkN1jTQ4x/ceXMsp46tYtWU3L7+1h7K011+ThlVy6Umj2VZ7iIm1BxgWYX+FQfMBZ6PL2l8KYfeBOvoDb9UeZMjwbLtv7T7IwD6llJekAVi3bS9VfctZs3UPw/pVUF6awgyGVFZkfWYY8Pi67fQasJN39tcxtF8Fm3YdoPZAHeOG9AGMpza8zb5D9WzedYAX36zlve8azsDeZVw4cwSN++sYEMjWdR3netaMqQQ9B7UH6+gH/PDRV5mzazcOY9Gtfw3cTxl1vFLRcbug+May1dxW2sBjr2znwfptPPjyNgBue3R9VruX3qzlyjufyVq3ektt8/LrO/cDcO8LW7j3hS3N618ub+BXT73OJDvIIdvLQ5u38ZDv49sPvJJl75nX3+Fjd9RkrXtl696s19v2HGL5xl388snXAViYepnvlXV6sztBuFM4z77xDumDu9nujMu/vKxge+NsMw+Ww4Mvb+OiNCz5eQ3Pu10F2Xy0bD9vU8KwFPx2xWZ+v/yJvD/73T+vBeDGpS/Rj708X9G9N/c4HNaDjtFjPOUSLss3el+KV7fvi8TfX9buiMRPUthzsL67Q+g0dQ2N7AmpKFMxc7Cusdt897QjdAl6DppOTwGimBR5Yv3bodrfVnswVPtRs+9wQ3jGQxru1Vv2hGM4BILc5w/Wd6Ogozn0oubA4QZeenM3j63ZwrXAt+9/mf/4473N7/fvVRpIacoxtoVHygs20yGWcYp/oK6BXs6bkz9wuAGHozSdolep9+OSShm1B+t4YdNu5ow9itqD9fSrKCGdMg7VN7Jj7yH6lpeQShnlJanmuf5ttQd5ZM02Lgl/c7K+POF8kbz+2lrrzfEeqm9gxbodDOtfQUOjY3DfcnqVpTnc0MjK19/BAVV9yykrSVFekqKh0fFW7UEanaO8JMWQygp6laXZf6iB/XX1jK1vpBx4Y9cBRgLrttdSb2XsO9TApGGV1B6oo6I0za79h3lg1VZOn1hFQ6M3TiVpY9OuAxw43EB5aYopw/uRThlv7NxPyozGxhZhs5CndgohM7ZC4mw9/rv2HcJZiv69SkkZ1B6op76x0R+btLfuYD11DY2UplP0KU+TNiNNIwbUNTQw4fp7GdqvnMnD+vHoK9vb9JtOGRfNGMGGHfuoeW0XIwb06vI2xI3YCfpHfvo0T2/YSZoGrm1j7jmsOsNR8NDL25hoeznnq38M3PZJqc1cEurcc7T85yPr+WEZfO7ulSxrDO4ixI0lr/GhXo5fP/06XyiF8//9cQ5TmrP9v/7x5bxt/y7E/i/en4cWZn79AVwXJgWeL6+nn0Gjv5Fbaw+xtbZtMQcvUeE3KzY1v978zoFO+4wrefWumc03szVmts7Mrm+n3cVm5sxsVnAhZjN/qpfjEFVuSHTZ6BbJNkXhxxHF+IRn/8Dh+tDsh9f/wdoMMsY4/NgkhQ4F3czSwK3AAmAKsNjMprTRrhK4Bngq6CAz6VWW7riREEL0QPI5Qp8NrHPOveqcOwzcBVzQRruvAd8EQr36Vprunuu4xTznKUQT2k97Nvmo4wjgjYzXm/x1zZjZTGCkc+5e2sHMlphZjZnVbN+eew6sPUrTPematRBC5E/Bh7tmlgK+A3y+o7bOududc7Occ7Oqqrr2xOvWR+hhyXumXeei+xEJ6wirtd3wjuSCyZDIRZPNMCqYNJE5fxzkNgSVPZLLZlAYmf1bSJZL01hZs92uxlNoLD2FfAR9MzAy43W1v66JSmAa8IiZbQROApaGdWG0u6ZchChmoi0gIYqVfNRxOTDBzMaaWRmwCFja9KZzbrdzbrBzboxzbgzwJLDQOVfTtrnCaJpyiTL7JBo/EMUtEM5FkeUSjY842nchHWcGXXQy2CwX/dhERYeC7pyrB64C7gNWA/c4514ys5vNbGHYAbamTEfoQgjRJnndWOScWwYsa7Xuhhxtzyg8rNyUlkjQRXjEfZ5Wx8I9m9ipYzqgcqNCCJE0Yifo3SHn8T5mSyZhzstqzld3isaV+Al6q1qYUaX5hUV06YTt+w3Dblgpdbn8FT+ZfRMMoRQRsGDiDGrftshSBuJP/AS9uwMQogjRWYWAGAp604MNotuBo0pbjOZY0xH+KXBUPuJoP6xjzeIW9GKOLVnETtAboniSvBBCxJDYCXp3Jbnkc/z80ZPHhB9IRKz7xgL+61MnB2532oh+zctrvj6fX1w+O3AfhRCvOfkjiSr+b//d8ZH4EZ0jdoJebAfoZ07yatJ88rRjuGnhVB7/4plHtPnqe46NOqy8+OGHTwBg1FG9Abjt0pnN75WkU5wwemDzF3dQH+/pDN+8+LiWNq1+Xa+eN4HSdIqLZlY3r7vyzHEA/P2cUYwd3Icb3ju1+b3ykjSnTqjiunMnAlA90HuyzN1LTgpmA2PKi/90Hp8/Z2JBNt53/NEAPP7FM/n3RdPZ8C/n888XHtfBpzyG9+9Fhf8IxtMnDmbFV8/mT589le9ccjwPX3cG9159Chef0DLGk4ZWNo9zJieMHljQNojOE7snFjW2UvTMK/JB0nSkc+qEwVS/05tde/bx3PXn8tKW3fQqTbN6yx4mDatkxsgB/G3925wyYTAA1QN78/gXz+S+l7Zy0YwRbNp1gOOq+3O4oRHDuGRWNfO+8yhnHzuUUycMZt64vvDtUDahze25et4EBq4oY/GxIymbOoyXvzaf8pIUh+obqShNs/Ff35P1uYtPqOZ9xx9NOmXN9wBcNLOaQ/WN9C0vYc/BOsyMkpRRUZqGJxzD+1VQNbEK9tRz3HmT+cxZEyhLp0j5n2/t46qzJvCxU8ZSUZJubvPcjedSlvYeG7d97yF6laWpKEnzxRv+AmQWfAp+/DMtBlucK9vugmnDWLN1Dw9eezoAd/xtI5OGVtK3vIQrzhjHtx94hQlD+nLWsUP4/DmTWPj9x9n49r6shy5/+fxJ8CDMHT8INsK/LZrO/btH8olTj+E/Fs8AvH0SYPHskby2cx8XHD+C/r1LOap3GVfd+QzvHjeIr9+7mrOPHcIHThhJ+gGYVj0AXoeFxx8NfcsZ1Nd77Fsmq2+ez+H6Rvr39p7o9A/vHsNRfcp4Y+d+StMphqb3wr8VPlYtWS5FdjRXhMRe0MPGzDuCHdW/FHqXcvI4T7hnjGo5+mgS8yaqB/bm8lPGAjDQP7L99Bnjm99fecO5LY0P7wsr9JwYLSUUKvznlTb9bYuyVnfnlqZTzUXSKivafjxbSTrVPD/Wnu0mepdl74r9e7XYHdovuEfMFQPTRw6AigGcfukJWesvmzu2ebk0nTrih+9Pnz3tSGNbV8GDvmhvhDFH9WHJ8UceLYOX8vulBdlniz/+6IkAfPzUY1pWPgBp63hus1dZOuuBM03jdExVX2/Fvv0d2hDBErspl5bn7CYtyyWaTAUH3q9UmJgRdr/FuThXKH0T9JgGaE/H1dERP0GP+Ag9aTtj0rZHCNFCDAU9Wn9JE8Biu6gshAiO2Am6i1iRmudyE6KEcd+MfObjCyH+t8DEfIBFQcRP0Fu9zucLuGDasC77O3vykDy9dJHWWTsh11jp36ukTb+BkWU3eB+Th/XNshy3R9C1XGAOym4I4xjUGLrieATdVWeO77hRQoidoJ997FA+NGdUpz7zg0tP4OOnjO24YRsUY7neQm5gyszOiSN9yzuXmPXF+ZM77WNg77K82rXO/smHk44Z1OnP5Efx7adR8KuPz+mwzXXnTYogkuIgdoJeVpLiGxcex3c/OL153Z0fn8Pw/hXNN05cetIoFp04kiGV5TzxpbMA+Py5k5g7fhCjB/Xm/713Cp887Rhu/3BL2tg5U4Y2L1dVlvP//8F/JGrYGSE+J44dlHdmxU0Lp/I9P8e484SfgQIWWr813TCT7zZ86oy2U/jaI2X5HZdeOH0E500d2nHDDEpSqXD6pqizXHLbap2a2RF/P2cUz990LsP8FMmTxw3i5HEtP5JnTR6S1f77f9/V70k8iV0eehPvnzEC/geumTcBxg/miS/NA7wBb4teZWl+9fEj70DMuUNtXxNYrPkweVglz7yWve6SWdXcU7OpzfbvOW44v12xiZqNO/nkaeN4YPVbvLi5ts22vUrT3HLhuzKeBBtfcj2C8Oj+FUwcVskja7Yf8d5Dnz+da+95jsvmjuGpDTu586nX27Rx2dwxLHYj4eVnjnhv2dWncv73/nLE+u9cMp3LfrqcS989mr+u3cHdNW+0afvG901h6gv92nyvJ/HBWSP5w/Nvsv9wQ/O6fzxvEuu27eV3z3rPnr9oxgjuX7WVvYfqOW1iFb22pKEOUilrPmh78svzmj9/5yfa+F7v2MfhhkYmDq0MeYuKi9gKetKoHtibQ0f3Y/Th3qz+VMsdeNedN4kzbnmEC2eMYMvug+w+UAd4U0E//1hLHZTPnDWem/93FaMH9eaf/rCKFV89m0F9y1scrH846k2KhPfPOJqlK+C7i2YwrqoPs//5QRpapUIdU9WX3185F4ALpo/godXbeKv2IAD/vmg619y1kvKSFDe+byrc2/YPxpSj+7H8K2dz4jf+3LzODPqUl3DPFe8GvLsqH1u7nS27Pdu/v3Iu77/1r3znkuO9cggv9sxpkUymjxrAofoGfr/yzeZ1V/pz3C9u3s3abXu54oxxfCfjDJxveH8603tjBvcJINr4IUHPm3CzBwwYX9UX3rSsO/CGVFaw6ub5HX4+lTJuWujVScm84zCpnDd1GKyFMyYN4X9PPoVpI/oDsPbrC7jilyu4f9XWnJ998svzOFjXwH89s4kF04ZzDStb5rb9C3nzJg+F9d6RdW2DN6deVVnOohNHctfyto/CgeYzxSY6O6VQMEWcxnTbh2Zy1rRq/u6EavYfbuC7i6Znvf/BE0fy9XtXM6SyPIcF0RESdBFLFs8eBWshZdYs5uD9sN3edP2jHSpK03xozmgAHvjcaYzwC4M1MeeYo2A9LDpxFJS1HO3968Xv4pwpQ7n8ZzWM9IuaifyYd+xQMKMk3fYYXX7KWD42d2xzPR/ReeIv6FGk34V6YbR1/OE+WiFSP6G4aJWwGMD4TzhinjVjvNuwf9bkIfz0oydy2sSqTnrK7JuAOieU/d8RTP+6jP+zl9rCzNr+qjXFUMRnH8VC7LJchOhuzIwzJw8pspTWYoolm8pOppqKrhNzQY9iJ47oi2LhpfpF7ieC4lzhD0tIDsLqmyJOW6ysyC+vXxROzAVdCCFEExJ0IYRICBL0fNEFmR5C3Mc57vGLQkiAoEeRFRJdca5oimZF5SfMwlFNYxKCj6z54wDth9I3YfVxAP3btL1WaMaMa/VX5CIBgi6EKOYsFxEd8Rb0qLJCIiGKollR+YkqkyaO9kPqmyLOctGPTXTkJehmNt/M1pjZOjO7vo33rzWzVWb2vJk9aGajgw9VCCFEe3Qo6GaWBm4FFgBTgMVmNqVVs2eBWc65dwG/Bb4VdKBCCCHaJ58j9NnAOufcq865w8BdwAWZDZxzDzvn9vsvnwSqgw2zGNAFmR5B3LOZ4h6/KIh8BH0EkFlebpO/LheXA39s6w0zW2JmNWZWs337kXWrhRBCdJ1AL4qa2aXALOCWtt53zt3unJvlnJtVVdXZwkY5UHGu4vUTWuEoCLI415G0X5yr64RZnCvIfTTY4lwtoXXRls468iafqjmbgZEZr6v9dVmY2dnAV4DTnXOHggmvI1TLpSj9JCL7SLVcAjQWoC3RHvkcoS8HJpjZWDMrAxbR6mFmZjYD+CGw0Dm3LfgwhRBCdESHgu6cqweuAu4DVgP3OOdeMrObzWyh3+wWoC/wGzNbaWYJeHqlEELEi7wKFTvnlgHLWq27IWP57IDjEkII0UnifadolOjCTA8h7uMc9/hFISRA0KPKCgnLTVRFszrwG4nTIExGla0Tgn2X80VQRgMyGVARscD27bAzp5JDAgQ9CnSVvvgIcUwiq98TJEHHHKC9WPZnPIm3oCciPa7ZESrO1RkXcbWv4lwiPOIt6EIIIZqRoAshREKQoOeNLsb0COJ+0S3u8YuCiL+gR/XIttBIUC2XSDJ2IsoKCsV+mLVcgiSoOAPa55yyXPIl/oIeBbpKX4SEOSZxHG9luYjYC7qKcxWln7AKUB3hI1QHIZlVcS4RHjEXdCGEEE1I0IUQIiFI0IUQIiFI0PNFV9d7CHEf57jHLwohAYKetOJcUfmJaeGsIx65FqfiXBE8ki8QkxmPoAuiOFfTBdYginPpB6td4i3okdVySUqNlaj8xLyWS9iZQKrlIkIi3oIuhBCiGQm6EEIkBAm6EEIkBAm6EEIkhPgLemTFubqrWFdYduNaOCtkH6FmA0VQnCsQuyrOFVfiL+hRoOJCRYiKc4WLinPFkZgLuopzdd4PKs6Vn4OQzKo4lwiPmAu6EEKIJiToQgiRECToQgiREBIg6BFlhUSVTQzgkRwAAAhlSURBVBOZnwh8RPGYu8B9hJhFkRV7UH5C6I+g4gxs31Ytl3xJgKCLnokutGVTxP2hLJfIiLegqzhXkfpRca52jKs4V7fZSD7xFnQhhBDNSNCFECIh5CXoZjbfzNaY2Tozu76N98vN7G7//afMbEzQgQohhGifDgXdzNLArcACYAqw2MymtGp2ObDLOTce+Dfgm0EHKoQQon1K8mgzG1jnnHsVwMzuAi4AVmW0uQC4yV/+LfB9MzPnIqik88zP4ZX7grdbdyD79bbVcOuc4P001me/3rMlHD+H9ma/Xv0H2FwTrI/Ww71zQ/DbcmCXv+BfJHvwa/C3/wjOfu0WKO/b8vrH50Aqn69JHux8FQZP8JY3PxNM3zTvp35//P7TUNanMJsHdrbYe/Rb8PSPuman4bAfmm/rR2dBKt15O64RLOXtXz+Ym4ysmdO/ANMuDtxsPnvqCOCNjNebgNZ7YnMb51y9me0GBgE7MhuZ2RJgCcCoUaO6GHIGp10Hb71QuJ1cjD4ZqmcDBqUV4fkZPh3Gz4Mhx8LhvYSWa1txludj7tXw2l/D8TF0Kkw+H0bM8L6IYVA5HI6eAbOXwN6twdqumgSjTobxZ8O0D0BjXbC2Z3wYDu2B3kcFZ3f0XJh1Gezf4dkulKrJMOeT0L8aajcXZmvELJi+GJ79Vdf7csgUmDgf1t535AFQXKkYEIpZ6+gg2sw+AMx3zn3cf/1hYI5z7qqMNi/6bTb5r9f7bXa0ZRNg1qxZrqYm4CNEIYRIOGa2wjk3q6338rkouhkYmfG62l/XZhszKwH6A293PlQhhBBdJR9BXw5MMLOxZlYGLAKWtmqzFPiIv/wB4KFI5s+FEEI00+Ecuj8nfhVwH5AGfuKce8nMbgZqnHNLgR8DvzCzdcBOPNEXQggRIXldvnfOLQOWtVp3Q8byQeDvgg1NCCFEZ9CdokIIkRAk6EIIkRAk6EIIkRAk6EIIkRA6vLEoNMdm24HXuvjxwbS6C7VIUFydoxjjKsaYQHF1liTHNdo5V9XWG90m6IVgZjW57pTqThRX5yjGuIoxJlBcnaWnxqUpFyGESAgSdCGESAhxFfTbuzuAHCiuzlGMcRVjTKC4OkuPjCuWc+hCCCGOJK5H6EIIIVohQRdCiIQQO0Hv6IHVAfsaaWYPm9kqM3vJzK7x199kZpvNbKX/7/yMz3zJj22NmZ0XVtxmttHMXvD91/jrjjKzB8xsrf93oL/ezOx7vu/nzWxmhp2P+O3XmtlHcvnLM6ZJGX2y0sxqzeyz3dFfZvYTM9vmP3ylaV1g/WNmJ/j9v87/bF7PRcsR1y1m9rLv+3dmNsBfP8bMDmT0220d+c+1jV2IKbAxM6/09lP++rvNK8Pd1b66OyOmjWa2Msq+8j+XSxe6ff/CORebf3jle9cDxwBlwHPAlBD9DQdm+suVwCt4D8q+CbiujfZT/JjKgbF+rOkw4gY2AoNbrfsWcL2/fD3wTX/5fOCPeA+KPAl4yl9/FPCq/3egvzwwwLF6CxjdHf0FnAbMBF4Mo3+Ap/225n92QQFxnQuU+MvfzIhrTGa7Vnba9J9rG7sQU2BjBtwDLPKXbwM+1dW+avX+t4Ebouwrv20uXej2/StuR+jND6x2zh0Gmh5YHQrOuS3OuWf85T3Aarznp+biAuAu59wh59wGYJ0fc1RxXwD8zF/+GfD+jPU/dx5PAgPMbDhwHvCAc26nc24X8AAwP6BY5gHrnXPt3Q0cWn855x7Dq83f2l/B/eO/188596Tzvn0/z7DV6bicc/c755oelvkk3lPBctKB/1zb2KmY2qFTY+YfWZ6F9/D4vGPqKC7f7iXAr9uzEXRf+XHl0oVu37/iJuhtPbC6PYENDDMbA8wAnvJXXeWfPv0k41QtV3xhxO2A+81shXkP3wYY6pzb4i+/BQzthriaWET2l627+wuC658R/nLQ8QF8DO+IrImxZvasmT1qZqdmxJvLf65t7ApBjNkg4J2MH6yg+upUYKtzbm3Gusj7qpUudPv+FTdB7xbMrC/wX8BnnXO1wA+AccB0YAveqV/UnOKcmwksAK40s9My3/R/2bslJ9WfI10I/MZfVQz9lUV39k8uzOwrQD3wK3/VFmCUc24GcC1wp5n1y9degdtYdGPWisVkHzBE3ldt6EJB9oIgboKezwOrA8XMSvEG7VfOuf8GcM5tdc41OOcagR/hnW62F1/gcTvnNvt/twG/82PY6p+uNZ1qbos6Lp8FwDPOua1+jN3eXz5B9c9msqdFCo7PzD4KvBf4kC8G+NMab/vLK/DmqCd24D/XNnaKAMfsbbwphpJW67uMb+si4O6MeCPtq7Z0oR170e1f+Uy0F8s/vEfmvYp3MabpwsvUEP0Z3vzVd1utH56x/Dm8OUWAqWRfMHoV72JRoHEDfYDKjOW/4c1930L2RZlv+cvvIfuizNOu5aLMBrwLMgP95aMC6Le7gMu6u79odaEsyP7hyItW5xcQ13xgFVDVql0VkPaXj8H7UrfrP9c2diGmwMYM70wt86Lop7vaVxn99Wg39lUuXej2/SsUIQzzH94V41fwfoG/ErKvU/BOm54HVvr/zgd+Abzgr1/aauf/ih/bGjKuTAcZt7/DPuf/e6nJHt585YPAWuDPGTuHAbf6vl8AZmXY+hjeha11ZIhwAbH1wTsq65+xLvL+wjsd3wLU4c1BXh5k/wCzgBf9z3wf/67rLsa1Dm8utWkfu81ve7E/viuBZ4D3deQ/1zZ2IabAxszfX5/2t/M3QHlX+8pffwdwRau2kfRVB7rQ7fuXbv0XQoiEELc5dCGEEDmQoAshREKQoAshREKQoAshREKQoAshREKQoAshREKQoAshREL4P7OjqIPAoMySAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.539]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 5, training avg cost 0.538868\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZgV5ZX/P+fe3oBumqWbfVUWBVSWFo24xgWUKMYYA5PFGCdmGWP2/Mw4olHnF6MzmZkkTkzymM1JoiYzSZgJRhNHYzZBEFwAEWSRRpB9k6W3d/641d23l9t9u29V9a3q7+d5eKjl7XPOfavu91a976lT5pxDCCFE9En0dABCCCH8QYIuhBAxQYIuhBAxQYIuhBAxQYIuhBAxoaCnHFdUVLhx48b1lHshhIgkK1eu3OOcq2xvX48J+rhx41ixYkVPuRdCiEhiZlsz7dOQixBCxAQJuhBCxAQJuhBCxAQJuhBCxAQJuhBCxIROBd3Mvm9mu8zslQz7zcy+YWYbzewlM5vpf5hCCCE6I5sr9B8C8zrYfzkw0ft3E/Dt3MMSQgjRVTrNQ3fOPWtm4zposgD4sUvV4X3OzAaY2XDn3A6fYmzJ1r/C6/+bWk4WwqwboLTdHPvcWfkjOFidWj5lPoyYHoyfV/4Ldq1LLZ98EYw9x38fDQ2w7EE4th8SSZj+fhgw2n8/ry6FN1ellifNhVFV/tqvPZb6HDVHoagfnPVxKCzxz/7Balj1H9BQD0NOhWnX+Ge7eiW89tvU8tSrYehUf+yu+AEcehP6VcDsm8AsN3ub/wibn4WivjD7Y6n/u8tLj8GeDTBsGkxZ0D0brz0J1c+nvn+nzO9+LL0APx4sGglsS1uv9ra1EXQzu4nUVTxjxozpnrfq5fDs/YBXx73vYDjzxu7Z6ohjB+C/b2le3/0qvO9h//0ALPkU1BxJLW/9M9yw1H8fe9bDE19uXrcEXPAl//0s/QIc2p5afnMVfOAX/tp/46/w+zub10fOgvHn+Wd/9U/hma+mlovK/BX0Z+9rFvSD1fBuH25m394L//OZ5vVJc2HguNxs/v4O2L4ytTz0NJh4Sfdt/fJj4Bqgz8DuC/rjX4L9m6FshAS9E0KdFHXOfdc5V+Wcq6qs7OZV9ZxPw50H4Iuve0Yb/AswnUa7l98HQ6cF5wdSV4Pn3ALjL0gtB+UD4LqHW64H4Wfm9TDqTHAB+GjwjsOld6f+99tHY7+c86lgbI+YmRJcv2w32hl9VrOPXGmoh35DWtrvLo3fm4Ycvj+NMQRxPsUMPwR9O5B+7z7K2yaEECJE/BD0JcCHvGyXs4GDgY2fCyGEyEinY+hm9jPgQqDCzKqBO4BCAOfcg8BS4ApgI3AUuCGoYIUQQmQmmyyXRZ3sd8Df+RaREEKIbhH9J0WdC8duUH5SxjMsB+UjJD+B9FnQxyXI+AOwHXQf52Lf+XReN/5poN/BeBB9QRe9k1xzrWOHz/2h/o0kERb0sE44C8eXWThfotB8BO0novaD6hvfj6sEPYpEWNCFEEKkI0EXQoiYIEEXQoiYEANBDysrJEBc0Jkh7dgNxU8APtrEHVb2kR/mguiboPvYpyyXnM431+p/kYkYCHpY6GTqFUQ2Nc6bxIxs/MIPoivoYaVVhZa+FVI2TSh+QsjYiaz9gPrGb5tKW4wk0RX0MNG5nYcEeVB0wNUH0USCLoQQMUGCLoQQMUGCLoQQMSH6gh71NL+U8QzLQfkIyY+KcwVvO5A0Tr/TDVsvd9WM8yGW3kH0BV30TpSF0QpluQgJepbErDhXKCl/ES2eBcHGr+JcIkAk6EIIERMk6EIIERMk6EIIERNiIOgqztVlH6H5UXGuluaiUpzLJ/sqzhU6MRD0sNDJ1CuIempc1OMXORFdQVdxrjz2o+JcHRhWcS4RGNEV9DDRyZ2HqDhXsKgPoogEXQghYoIEXQghYoIEXQghYkL0BT2sNL9AUXGubtlvZ9VfV1EqzuXnmLeKc0WVCAt6mNknBH8yqZZLN3wEQRACmUbgtVx8Ok81JxpJIizoYaKzu1ehrCZ0zkcTCboQQsQECboQQsSErATdzOaZ2Xoz22hmt7azf4yZPW1mq8zsJTO7wv9QhRBCdESngm5mSeAB4HJgCrDIzKa0avYPwGPOuRnAQuDf/Q40MyrO1WUfYfoJ3L6Kc/mOX3GqOFfoZHOFPhvY6Jzb5JyrAR4BFrRq44D+3nI58KZ/IeYLOpl6BZFNjQspG0vkNdkI+khgW9p6tbctnTuBD5hZNbAU+FR7hszsJjNbYWYrdu/e3Y1wWxjL7e/zzY+Kc3XRRVTtqziXCA6/JkUXAT90zo0CrgAeNrM2tp1z33XOVTnnqiorK31yHQI6ufMQFecKFvVBFMlG0LcDo9PWR3nb0rkReAzAOfdXoASo8CNAIYQQ2ZGNoD8PTDSz8WZWRGrSc0mrNm8AFwOY2amkBD3HMRUhhBBdoVNBd87VATcDTwDrSGWzrDGzu8zsKq/Z54GPmtmLwM+ADzsX0uxMVLM1WjrLsByUj5D8hFLLJcDjFKlaLn6iWi5RpSCbRs65paQmO9O3LU5bXgvM8Tc0IUT2aMxbRPpJURXnyls/Ks6VmcgU59IPRBSJsKCHiU7u/CPAYyIxQ+d8NJGgCyFETJCgCyFETJCgCyFETIiBoKs4V3Y+OvDpq5+AUzBVnKu1Ue9/H8e8VZwrskRX0EOv5RL0yaRaLl1zEZD9RuGJai0Xv36oNTEcSaIr6EIIIVogQc8GXa3kISrOFSzqgygiQRdCiJggQRdCiJgQfUFXca5u+AjJj4pzBW+7VxXnyiGUXkL0BV0Igca8BURa0FWcK2/9hPU5AkHFuVra88WYj7ZER0RY0IUQQqQjQc8KXWHkH6q2GCw+9oH6MzQk6EIIERNiIOgxqOXSwm1IWTuh1IxRLZeW5oKs5eIjquUSWWIg6EIIDQsKiLKgqzhXHvuJcCaNinO1tOePMR9tiY6IrqALIYRogQQ9GzRLn4eoOFewKMslikjQhRAiJkjQhRAiJkRf0KNenCu0NDwV5+p5AuibJjt+Dmvka3GuKB3rniHCgh5yLZfA3YRVy4WQMlAimuXS7CAgs0HXcslHexpDD4sIC3rI6OqglxD14xz1+EUuSNCFECImSNCzQreM+YeKcwWL0hajiARdCCFiQlaCbmbzzGy9mW00s1sztLnOzNaa2Roz+6m/YXZExItztSmaFZafMLKDol6cqz1/eWKr2WgAJlWcK6oUdNbAzJLAA8ClQDXwvJktcc6tTWszEfgyMMc5t9/MhgQVsBBCiPbJ5gp9NrDRObfJOVcDPAIsaNXmo8ADzrn9AM65Xf6G2Q6hF+cK3BEqztUVFxG2H2Rxrry0pzH0sMhG0EcC29LWq71t6UwCJpnZn83sOTOb154hM7vJzFaY2Yrdu3d3L+IeQ7d7vYKop6dGPX6RE35NihYAE4ELgUXA98xsQOtGzrnvOueqnHNVlZWVPrkOAc3S5yEqzhUsynKJItkI+nZgdNr6KG9bOtXAEudcrXNuM/AaKYEXQggREtkI+vPARDMbb2ZFwEJgSas2vyJ1dY6ZVZAagtnkY5yZiXotl56osRIoMavl4qv9IGu5+IlquUSVTgXdOVcH3Aw8AawDHnPOrTGzu8zsKq/ZE8BeM1sLPA180Tm3N6ighRCt0bCGyCJtEcA5txRY2mrb4rRlB3zO+xcSKs6Vt35UnKsDs1EpzpW3xkQH6EnRbNHtnogEOk97MxJ0IYSICRL0rNAtY/4RsQd/IofSFqOIBF0IIWJCDAQ94ml+PVI0Kyw/MSjO5af9QPvGx6vgIFIqVZwrFKIr6LGs5RKWn5CyaWS/PcPBdL9quQiiLOhCCCFaIEHPGt3u9Qqinp4a9fhFTkjQs0Gz9L0MHW9luUQTCboQQsSE6Au6inPlrx8V52ptzH+7Ks4l0oi+oAsh0DCRgEgLeszSCQ0V5+qyj0AdBGQ2KsW5lLYYRbKqtijQ7V6vIerHOerx+8vKrft46E+b8+7ru2j2GM6f5P9b2yToQojY8t8v7uC3r+xkwpDSng6lBQeP1QZiV4KeFbplzD9UnCtY4pG26Jyjf59CnvzsBT0WQ5hEeAy9EdVyyV8/quXS0lQQfRN0H/uQnZIz3a/l4oBEL/qBjoGgCyF0F9k+Dc71qp6JrqDHsjhXGL7C8BNWJk2Q9gM0HETsynJpF+d61whadAVdCCE6ocGB9SJFl6ALkU6+5bd1lajH7zsachFCiFigIRfRlt50RkSGII+Jjre/aYv+meoqDc4pyyVSBJa1qOJcOfsJow9VnMtb8FO0gijOlYOtpqzFbqQtut718xxhQY9bLZcQMkOa/ITgQ7VcMphVLZcw6W2TonpStBNO1NdTe6KOfiWF7D58nG37jjF1RH8Kkwmcc5gZRupR3p2HjjOgbyHHaxsYPbAPa3ccYlj/EirLijEzausbKEgYzsGh47UcPFbL2PJwD8Er2w9ycm09O/e8zWce+DN19Q3cMGc800eX8+rOwyTMWLZpLz9fWc23PzCLT/zHSt49YyQ1dQ0sefFNbrl4IudOqGDf0RpwMHpQH/66aR8rt+xjzoQKrnGOo8freGPHIcb2qeMLD69k39EaFp45mikj+rNt3zH6lxTwxw17eGzFNu65ehqLf72Ga2eN4q1Dx3l5+0Huv/YMjtbUUVPfwLD+JRw5Ucfv1r7FiAF9GLN9N+eH2mP+4+hc4hoanPcbn7mlS8uxPnislnLgwLFaag+foLKsuEXb9TsPU5g0ykoKKUwaNXUNDOlf0qJN9f6jlJUUUg7sPHicYcDT63exp3Yb00aWM3ZwX7buPcqjz2+juCDB+rcO89HzTuI3L+/gS3Mnc+hYHWZQVJDgV6u2M760lsu62Dd+43C9asRUgp6BQ8dr6Q985w+bOGv/QRzGwn98ync/RdTyWglsP3CMkb5bb8s/Ll3Hg4UNPLN+Fy/WHQDgCz9/sd22139/OQA/WfZG07b7n1jP/U+sb7f9r1a/yZXFDfxk2RtMtuPUHj7Cb6t3ArB88752/+amh1cC8K2nNzZtu/Jbf8oY/1WJas4vyrjbB4Idwlm17QDJ4wfZ7Ywbb/1NzvZOtu08VQxPvbqLa5LwoYeW8ZLbk5PNPxQdZS8FDEvAr1e/ya9eeClj22fW7wbgp2nnSCP9OcJLJVBT7wj0kHWAJkUFACu27gdg0+63Q/H319f3huJH9By19Q0cDqgoUz5TU9/QY76dJkUFQHEy2bQcRiZrcWGy80Y5UL3/WKD2wyfI4lzBmF2343AwhgMgLtnb2QxvxYnIDbkcq6lnzZsHeXb9Dj4H/POTr/LNx3O/dW3NONvBM8Wdt8sV827xCxLGsdp6Curr2b7nbU7UNTCwb2HT+Ht9g2P34RMkEjCkrIT6BseRE3XgUmOWyzbv5dwJFRw4VsvAvkU0OEdBInUqmxn/9cI2bgn+47T48gTzRUr1167DJxgCrNy6n0PJXZSVFDB1RDl9ipLsPHicA8dqSJpxcmUpm/e+zebdbzN9zABWv3GAkyr7cfBYLSdVlnLkRB0lBanrmvI+hew5eJyhwNa9bzMOeGT5G5SV9mPysFLGV5RyrLaezbvfpq6hgZOHlHL4eB37jtQwpH8xjyzfxowxAxjUr4ixg/s2lUgtKy4Eg/qG5itVC7yoWPdJjy2XOBuPf11Dysah4zUUJgvoU5Tk4LFaausbGNyviAYHyYRRV99AbX1qzLsomSCRMI7W1NGHBgw4UVfP5Ft/wyWnDmHnoeO8sv1Qk69B/Yo4VlPPwtmjWfPmIV7cdoA+RUkOHO1dd0SRE/Trf7Cc5Zv3kaSez5V03j4qvFx9kIbELibZUS79p2d8t392Yi+39NRAZgDc/utX+E4RPPSnTSx99nnf7N5RsINrkrU8tqKaLxXC4iWvUEOhL7Z/GWD/5+/Pg5dtaHDGV57EdWNQ4KXiOvqnXR38ft2uNm32vV0DwA/+vKVp24m6nhvq6Smy6l0zm2dm681so5nd2kG795iZM7Mq/0Jsybypw4DwbgnDy0a3UD5TGH4cYRyfaNoPrv/9telnjPn8YxM3OhV0M0sCDwCXA1OARWY2pZ12ZcCngWV+B5lOn6Jgx5qFECKqZHOFPhvY6Jzb5JyrAR4BFrTT7m7ga8BxH+NrQ2GyZ+Zx83nMU4hGdJ72brJRx5HAtrT1am9bE2Y2ExjtnOtwdtLMbjKzFWa2Yvfu3V0OFqAw2ZvmrIUQIntyvtw1swTwdeDznbV1zn3XOVflnKuqrOzeG69bX6EHJe/pdp0L70ckqCus1naDu5LzJ0MiE402g6hg0kj6+LGfn8Gv7JFMNv3CSO/fXLJcGo+VNdntbjy5xtJbyEbQtwOj09ZHedsaKQOmAc+Y2RbgbGBJUBOjPTXkIkQ+E5e8cZEb2ajj88BEMxtvZkXAQmBJ407n3EHnXIVzbpxzbhzwHHCVc25FEAE3DrmEmX0Sjh8I4xEI58LIcgnHRxTtu4CuM/0uOulvlot+bMKiU0F3ztUBNwNPAOuAx5xza8zsLjO7KugAW1OkK3QhhGiXrB4scs4tBZa22rY4Q9sLcw8rM4UFEnQRHFEfp9W1cO8mcuqYTOiUFUKI9oicoPeEnEf7mi2eBDkuqzFfPSkaVaIn6K1KYYaV5hcU4aUTduw3CLtBpdRl8pf/pPeNPwRSRMD8idOvc9tCSxmIPtET9J4OQIg8RHcVAiIo6IeP1wFhnsBhpS2Gc63pCP4WOCwfUbQf1LVmfgt6PscWLyIn6PVhvEleCCEiSOQEvaeSXHK5fr7x3PHce81pPkaTO5OGlrXZNmV4/6blG+aMa/fvXr17HmvvmstP//asTn2091Tv1dNHNC1fO2tURh+rbr+UK88Y0e7+IMnnMfl3njKk0zZdif9f3ze9yzGcMXpAu9tX3X4p//Opc7tsT/hL5AS9py7QRw/s0+72r1w1ldGD+vDFuZPb3X/DnHHc/q4pLJw9ht99Nn/eVz9lRP822x792Nk88DczWfb3F3PHlVOZf9rwFvtPquxHSWGSvkUFnDOhomn79e8Y26LdNTNTQj28vO0bSP75uun828LpLP/7i/mn957B6aPKW+yff9pwSgqTDOxXxO3vOrVp+8gB7fd/XBg5oA/XVY3ipIp+fPLCk9vs33LvfB66vuvVNLbcO58lN8/hvvec3mbf1TNG8urd87K2dc2Mkfz67+aw5d75FLV6HmRgvyKmjWw+luMG9+1yrCJ3IvfGooZWip4+I+8njVc6502sYNzhUob2S/DgpTN5bEU1F06uZPUbB7jr6mmUFhdw/TnjAJg7dRhrdxxi0tBSnt+8jwYH11U1l8GZOLSM+689nTPHDWJwaRGPv7KTV7fuhJcD+Qjtfp6LTx3KwB1FWEkBa++a6736y1GYTFBUkGD+6c0i/o1FM/j6+87AMByOZKsMo9fuuZyEQUEywe3vmkJdg6Pwq8bw/iVUTh5CwZEdrL1rLg0uddwKEwmSCWPB9OZinf/5iXOob3A0OIdhLYRiSFkJa74yl75FScyMt0/U8ehDL8Gu9IJP/h//dIv+Fudq5tRhpS1qmH71mtM4f1KqYN3bJ+r492deb9o3aWhp6u/NeOj6Km780Qo+Mmc83//z5hbnKZvh4xeczNrkRDbsOswnL5wAwOmjBjB1RDlvHTpOg4O+RUnecfJgAEoKk3zng7Oo3n+MAX0KeXbDbopfS1A1ajC8AXcvmMKtUy7haE0dI9J+VFfdfin1zlFckODtE/VN21++8zLMjNLiAnbtrIYHcz9WzVku+Xv3lC9EXtCDJpGAEeUlUF/LvGnDmTctJXgfekfbthOGlDJhSOrLd8qwtlfAAO9NE/jrqkbD6YNCEfRGkolmYelblDr8xRnOgmTCSCYyv1AkXXwLkgkK0po2vs+00UcmCpMJOno/dr+04PoVF9A34Jdph8H00QOgpIQt985vd3+/4oKM+y4+dWjTvsVXToG31sK3aRLbuVOHMXf0pDZ/l0wYn7p4Yrs253pvAQN4z6xR8C/FYM3Hr29ZMdDyBbvpx6U47cCXlTS/rm9IWYzeERkRIjfk0vye3ZCyT0IqnRvOa9sArOnLGpwLI9hXREe7OFcgfeP3MfXRnq6rwyN6gq4sl16NDr8QmYmgoIfrr0DVHfMKp+s9ITISObVyIV+ilfcpbHQcqt+giMenCI7oPwKjI9ybiZ6gt1rP5Qs4e/wgPnre+CxaBvg1b521E3CNlSZ3Qf1AtbAbgA8Xj1fQ+dc3wfVxzvabjpVeQRcWkRP0S04dyvvPGtO0nv4wTFf4wYfP5LGPvYPb5k9pN1+6kcb0rrgQ9c/TWdZM7yX69xZ+cOeVU3o6hB4lcoJeVJDgH999WtNTbhdObvuy6T/9v4u4++pp/P5zF3DbFafy5ctPoU9hMpWrS2oYZebYgU3tl9x8LhWlRdx47nj+bWHLp+dKQkqTO3P8YBxGn276u/1dU3j+tkt47Z7LO2x3cmUpwX/5g8ukac6T7779Vbdfyot3XNZhm+5eC669ay5rvjK3A7sB9U1eZ7lkb2tyO08wZ8ugfkVNz4Sk8+kM6ZpxJLKXO1fPGAm/TuUxt5ez+8GzU08vNuaFf+yCtk/fNVJZVsyKf7i0aX3B9JGwez084HPQHXDKsDJe2AqJbtY2uOK0YVSWpXKFn/vyxcz7t2cZ1r+Es08azA//sqWpXdSvcLv6CsKy4gIOn6hrWr921igG9isCUg9G3bFkDT9b/gYAU0f0h10woE8hHO5efI39u/6eeVz5zT/x2ltHAFJPxO6Ci06phGP7u2c85nzg7DHcc/Vp/HHDbj740HLePWMkv1zV/D76RML4zS3n4hxMHFpKcUGSg0drqalvwDlHifcA2ouLL6O2oYH+JYX85fU9XDCp7UVfXIncFXpcGTWwL1NH9KeitIhxg/vy/999Glvunc+We+fz+KfPA+Dr151BZVkxL995GasXX8rdV08D4K4FUxle3vwU37DyElYvvozffuZ87rxqKlvunc9Pbuy89kqUeej6Ki6bMrTN9i9fcWqL9fQ7uqKCBF9Nq7Hzm1vO40PvGNvmrViNdW3OGj+oxfa/SRv6g9Sj8Y0UFyT56jWpx+1njBnAkpvP5fSR5SQ0NNKCksKUBL2vajT3XJ06FudNrGTLvfP5l/dNZ8u98+lblLprNWDqiHKmjSxvepipvG8hlWXFDOlfQn/voabyvoVUlBZTVJDgwslD2rxDIc5E+3ItVIKdkDFgQmUpvGk888WLWuw7dXj/pruQxjopkLoLabwT6W3MnToMNsDdV0/lsyddyPiKflw0eQgn/X3q1bfXzBzJrLED+ZuzxnC0po57frMOAMsgqE1X/t5E3sWnDIXX4Y9fuoiBAwZyx5VTAXhyzU4efm4rA/sW8flLJ/HTZW802Wj9QzB99AAWzR7Nxzu4O/SdCGVjvXfWKD5/2WSeXr+Lq3qgEFsckaCLSLJo9hjYAOV9iiiv6AekbsnX3TWP4oJEi6Grvz3vJA4dr+MbT21gWHlxG1uL3zWFcydWtNh21kmD4HWoKC1uUeLzsqnDuCztUfmvX3cGuw6f4N7HX6Ug2VLQkwlrukoX8MMPn8n7f7iK0uIC/vMT5zB5WGq8fNHsMZ38pciW6At6GOl3gd6ytY4/2FcrhOonEBetEhZbHf8+Re1PKt/yzgmcN7GCWWMHtdn3kXNbp66mHe9Ozq9rZo6ipq6BvUdOcPNFnU2+pfeNT50TyPnvyNS/XbfT/KlnjU2V3v38ZZOaxDw7M86HWHoH0Rd0IbKgIJngzHFtxdwPigoS3Da/p9Pl8necuKy4AGpSGWOZio4Jf4j4pGhIxazCwEIomhWWnwCLczX7CNZ8YA6C6ps8TlssKynyzZbomIgLuhBCiEYk6EIIERMk6NmiCZleQtSPc9TjF7kQA0EPIyskvOJc4RTNCstPkIWjGo9JAD5ajB/7aD+Qvgmqj33o38bPa7lmzLhW/4tMxEDQhRD5nOUiwiPagh5WVkgohJAZEpqfsDJpomi/9xXn0o9NeGQl6GY2z8zWm9lGM7u1nf2fM7O1ZvaSmT1lZr3zeXQhhOhBOhV0M0uSqjt4OTAFWGRmrZ+iWAVUOedOB34B3Od3oEIIITommyv02cBG59wm51wN8AiwIL2Bc+5p59xRb/U5YBSxQxMyvYKoZzNFPX6RE9kI+khgW9p6tbctEzcCj7e3w8xuMrMVZrZi9+7d2UcphBCiU3ydFDWzDwBVwP3t7XfOfdc5V+Wcq6qs9KnovIpz5a+fwApHgT/FozKRfXGurhFkcS4/z1F/i3M1h9ZNW7rryJpsinNtB0anrY/ytrXAzC4BbgMucM6d8Ce8zlAtl7z0E4vsI9Vy8dGYj7ZER2Rzhf48MNHMxptZEbAQWJLewMxmAN8BrnLO7fI/TCGEEJ3RqaA75+qAm4EngHXAY865NWZ2l5ld5TW7HygFfm5mq81sSQZzQgghAiKreujOuaXA0lbbFqctX+JzXEIIIbpItJ8UDRNNzPQSon6cox6/yIUYCHpYWSFBuQmraFYnfkNx6ofJsLJ1ArDvMq74ZdQnkz4VEfPt3A46cyo+xEDQw0Cz9PlHgMcktPo9fuJ3zD7ai2R/RpNoC3os0uOaHKHiXF1xEVX7Ks4lgiPagi6EEKIJCboQQsQECXrWaDKmVxD1Sbeoxy9yIvqCHtYr2wIjRrVcQsnYCSkrKBD7QdZy8RO/4vTpnHPKcsmW6At6GGiWPg8J8phE8Xgry0VEXtBVnCsv/QRVgKqNj0AdBGRWxblEcERc0IUQQjQiQRdCiJggQRdCiJggQc8Wza73EqJ+nKMev8iFGAh63IpzheUnooWz2rxyLUrFuUJ4JZ8vJtNeQedHca7GCVY/inPpB6tDoi3oodVyiUuNlbD8RLyWS9CZQKrlIgIi2oIuhBCiCQm6EELEBAm6EELEBAm6EELEhOgLemjFuXqqWFdQdqNaOCtgH4FmA4VQnMsXuyrOFVWiL+hhoOJCeYiKcwWLinNFkYgLuopzdd0PKs6VnYOAzKo4lwiOiAu6EEKIRiToQp9GFkQAAAh7SURBVAgREyToQggRE2Ig6CFlhYSVTROanxB8hPGaO999BJhF0SJ2v/wE0B9+xenbua1aLtkSA0EXvRNNtLUkj/tDWS6hEW1BV3GuPPWj4lwdGFdxrh6zEX+iLehCCCGakKALIURMyErQzWyema03s41mdms7+4vN7FFv/zIzG+d3oEIIITqmU0E3syTwAHA5MAVYZGZTWjW7EdjvnJsA/AvwNb8DFUII0TEFWbSZDWx0zm0CMLNHgAXA2rQ2C4A7veVfAN8yM3MuhEo6L/wYXnvCf7u1x1qu71oHD5zlv5+Gupbrh3cE4+fEkZbr6/4btq/w10frw71vs/+f5dh+b8GbJHvqbvjLN/2zf2gHFJc2rz90KSSy+Zpkwb5NUDExtbz9BX/6puk89frjV5+Eon652Ty2r9neH+6D5d/rnp36Gi80z9b33gmJZNftuAawROr8+vaceGTNXPAlmPYe381mc6aOBLalrVcDrc/EpjbOuTozOwgMBvakNzKzm4CbAMaMGdPNkNM4/wuw8+Xc7WRi7DkwajZgUFgSnJ/h02HCxTDkVKg5QmC5tiXvTPmYcwts/XMwPoZOhVOugJEzUl/EICgbDiNmwOyb4Mhb/tqunAxjzoEJl8C0a6Gh1l/bMz4IJw5D30H+2R07B6pugKN7UrZzpfIUOOtjUD4KDm3PzdbIKpi+CFb9pPt9OWQKTJoHG55oewEUVUoGBGLWOruINrNrgXnOub/11j8InOWcuzmtzStem2pv/XWvzZ72bAJUVVW5FSt8vkIUQoiYY2YrnXNV7e3LZlJ0OzA6bX2Ut63dNmZWAJQDe7seqhBCiO6SjaA/D0w0s/FmVgQsBJa0arMEuN5bvhb431DGz4UQQjTR6Ri6NyZ+M/AEkAS+75xbY2Z3ASucc0uAh4CHzWwjsI+U6AshhAiRrKbvnXNLgaWtti1OWz4OvNff0IQQQnQFPSkqhBAxQYIuhBAxQYIuhBAxQYIuhBAxodMHiwJzbLYb2NrNP6+g1VOoeYLi6hr5GFc+xgSKq6vEOa6xzrnK9nb0mKDngpmtyPSkVE+iuLpGPsaVjzGB4uoqvTUuDbkIIURMkKALIURMiKqgf7enA8iA4uoa+RhXPsYEiqur9Mq4IjmGLoQQoi1RvUIXQgjRCgm6EELEhMgJemcvrPbZ12gze9rM1prZGjP7tLf9TjPbbmarvX9XpP3Nl73Y1pvZ3KDiNrMtZvay53+Ft22Qmf3OzDZ4/w/0tpuZfcPz/ZKZzUyzc73XfoOZXZ/JX5YxTU7rk9VmdsjMPtMT/WVm3zezXd7LVxq3+dY/ZjbL6/+N3t9m9V60DHHdb2aver5/aWYDvO3jzOxYWr892Jn/TJ+xGzH5dswsVXp7mbf9UUuV4e5uXz2aFtMWM1sdZl95f5dJF3r8/MI5F5l/pMr3vg6cBBQBLwJTAvQ3HJjpLZcBr5F6UfadwBfaaT/Fi6kYGO/FmgwibmALUNFq233Ard7yrcDXvOUrgMdJvSjybGCZt30QsMn7f6C3PNDHY7UTGNsT/QWcD8wEXgmif4DlXlvz/vbyHOK6DCjwlr+WFte49Hat7LTrP9Nn7EZMvh0z4DFgobf8IPCJ7vZVq/3/DCwOs6+8tpl0ocfPr6hdoTe9sNo5VwM0vrA6EJxzO5xzL3jLh4F1pN6fmokFwCPOuRPOuc3ARi/msOJeAPzIW/4RcHXa9h+7FM8BA8xsODAX+J1zbp9zbj/wO2CeT7FcDLzunOvoaeDA+ss59yyp2vyt/eXcP96+/s6551zq2/fjNFtdjss596RzrvFlmc+ReitYRjrxn+kzdimmDujSMfOuLN9J6uXxWcfUWVye3euAn3Vkw+++8uLKpAs9fn5FTdDbe2F1RwLrG2Y2DpgBLPM23ezdPn0/7VYtU3xBxO2AJ81spaVevg0w1Dm3w1veCQztgbgaWUjLL1tP9xf41z8jvWW/4wP4CKkrskbGm9kqM/uDmZ2XFm8m/5k+Y3fw45gNBg6k/WD51VfnAW855zakbQu9r1rpQo+fX1ET9B7BzEqB/wQ+45w7BHwbOBmYDuwgdesXNuc652YClwN/Z2bnp+/0ftl7JCfVGyO9Cvi5tykf+qsFPdk/mTCz24A64Cfeph3AGOfcDOBzwE/NrH+29nL8jHl3zFqxiJYXDKH3VTu6kJM9P4iaoGfzwmpfMbNCUgftJ865/wJwzr3lnKt3zjUA3yN1u9lRfL7H7Zzb7v2/C/ilF8Nb3u1a463mrrDj8rgceME595YXY4/3l4df/bOdlsMiOcdnZh8G3gW83xMDvGGNvd7ySlJj1JM68Z/pM3YJH4/ZXlJDDAWttncbz9Y1wKNp8YbaV+3pQgf2wju/shloz5d/pF6Zt4nUZEzjxMvUAP0ZqfGrf221fXja8mdJjSkCTKXlhNEmUpNFvsYN9APK0pb/Qmrs+35aTsrc5y3Pp+WkzHLXPCmzmdSEzEBveZAP/fYIcENP9xetJsr87B/aTlpdkUNc84C1QGWrdpVA0ls+idSXukP/mT5jN2Ly7ZiRulNLnxT9ZHf7Kq2//tCDfZVJF3r8/ApECIP8R2rG+DVSv8C3BezrXFK3TS8Bq71/VwAPAy9725e0Ovlv82JbT9rMtJ9xeyfsi96/NY32SI1XPgVsAH6fdnIY8IDn+2WgKs3WR0hNbG0kTYRziK0fqauy8rRtofcXqdvxHUAtqTHIG/3sH6AKeMX7m2/hPXXdzbg2khpLbTzHHvTavsc7vquBF4ArO/Of6TN2Iybfjpl3vi73PufPgeLu9pW3/YfAx1u1DaWvOtGFHj+/9Oi/EELEhKiNoQshhMiABF0IIWKCBF0IIWKCBF0IIWKCBF0IIWKCBF0IIWKCBF0IIWLC/wFswZqyQxpLBwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.54] \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 6, training avg cost 0.538846\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZwdZZX3v+f2mu7Onk5COntIggmMSQgJsgVZJAQlBGc0jDKKMMAoiqIyqCMizDiK74y+o4wOjiujIoPKxDG+oKigsmUhsiQs2SDpBNJkXzq9Pu8ft7r79u2+3bf7VtW9Vff3/Xz607U8fc65T9X99VNPnTplzjmEEEJEn0S+AxBCCOEPEnQhhIgJEnQhhIgJEnQhhIgJEnQhhIgJpflyPGbMGDd16tR8uRdCiEiybt26N5xztb3ty5ugT506lbVr1+bLvRBCRBIzeyXTPk25CCFETJCgCyFETJCgCyFETJCgCyFETJCgCyFETOhX0M3sO2a2x8yey7DfzOzfzGyzmT1jZgv8D1MIIUR/ZDNC/x6wtI/9FwMzvZ9rgW/kHpYQQoiB0m8eunPuUTOb2keT5cAPXLIO7xNmNsLMTnDO7fYpxu688jhs+W1yuaQMTr0KanrNsc+ddd+HgzuTyyddAhPmBePnuZ/Bnk3J5RlvhSln+O+jvR2e/CY07odECcx7D4yY5L+fF1bDrqeTy7MugokL/bXf0pj8HM3HoLwaFl8PZZX+2T+4E57+L2hvg7FvgpMv98/2znXw0v9LLs+9DMbN9cfu2u/CoV1QPQYWXQtmudnb9gfY9iiUV8Gi65K/B8sz98EbL8P4k2HO8sHZeOkh2Lkm+f076ZLBx1IE+PFgUR2wI2V9p7eth6Cb2bUkR/FMnjx5cN52PgWPfhnw6rhXjYbTrh6crb5oPAC/+EjXesML8O57/PcDsOrD0HwkufzKn+Cq1f77eONFePBTXeuWgCU3++9n9SfgUH1yedfT8N77/bX/6uPwm9u61utOhWln+2d/w4/g9/+cXC4f6q+gP3pnl6Af3AkrfLiYPboX/vejXeuzLoKRU3Oz+ZvPQf265PK4U2DmBYO39fPrwLXDkJGDF/Rf3Qz7t8HQCRL0fgj1pqhz7m7n3ELn3MLa2kGOqs+8EW47AJ/c4hlt9y/AVDrsXnwnjDs5OD+QHA2e8RGYtiS5HJQPgHfd0309CD8L3gcTTwMXgI927zhceEfyt98+OvrljA8HY3vCgqTg+mW7w86kxV0+cqW9DarHdrc/WDq+N+05fH86YgjifIoZfgh6PZB67T7R2yaEECJE/BD0VcDfeNkupwMHA5s/F0IIkZF+59DN7MfAucAYM9sJfA4oA3DOfRNYDSwDNgPHgKuCClYIIURmsslyuaKf/Q74kG8RCSGEGBTRf1LUuXDsBuUnaTzDclA+QvITSJ8FfVyCjD8A20H3cS72nU/ndcefBvodjAfRF3RRnOSaax07fO4P9W8kibCgh3XCWTi+zML5EoXmI2g/EbUfVN/4flwl6FEkwoIuhBAiFQm6EELEBAm6EELEhBgIelhZIQHigs4M6cVuKH4C8NEj7rCyj/wwF0TfBN3HPmW55HS+ubTfIhMxEPSw0MlUFEQ2Nc67iRnZ+IUfRFfQw0qrCi19K6RsmlD8hJCxE1n7AfWN3zaVthhJoivoYaJzuwAJ8qDogKsPookEXQghYoIEXQghYoIEXQghYkL0BT3qaX5J4xmWg/IRkh8V5wrediBpnH6nG6YvD9SM8yGW4iD6gi6KE2VhpKEsFyFBz5KYFecKJeUvosWzINj4VZxLBIgEXQghYoIEXQghYoIEXQghYkIMBF3FuQbsIzQ/Ks7V3VxUinP5ZF/FuUInBoIeFjqZioKop8ZFPX6RE9EVdBXnKmA/Ks7Vh2EV5xKBEV1BDxOd3AWIinMFi/ogikjQhRAiJkjQhRAiJkjQhRAiJkRf0MNK8wsUFecalP1eVv11FaXiXH7Oeas4V1SJsKCHmX1C8CeTarkMwkcQBCGQKQRey8Wn81T3RCNJhAU9THR2FxXKakLnfDSRoAshREyQoAshREzIStDNbKmZvWhmm83sll72Tzaz35nZ02b2jJkt8z9UIYQQfdGvoJtZCXAXcDEwB7jCzOakNfsH4D7n3HxgJfDvfgeaGRXnGrCPMP0Ebl/FuXzHrzhVnCt0shmhLwI2O+e2OueagXuB5WltHDDMWx4O7PIvxEJBJ1NRENnUuJCysURBk42g1wE7UtZ3ettSuQ14r5ntBFYDH+7NkJlda2ZrzWxtQ0PDIMLtZiy3vy80PyrONUAXUbWv4lwiOPy6KXoF8D3n3ERgGXCPmfWw7Zy72zm30Dm3sLa21ifXIaCTuwBRca5gUR9EkWwEvR6YlLI+0duWytXAfQDOuceBSmCMHwEKIYTIjmwEfQ0w08ymmVk5yZueq9LavAqcD2BmbyIp6DnOqQghhBgI/Qq6c64VuAF4ENhEMpvleTO73cwu9Zp9HPhbM/sz8GPg/c6FdHcmqtka3Z1lWA7KR0h+QqnlEuBxilQtFz9RLZeoUppNI+fcapI3O1O33ZqyvBE409/QhBDZozlvEeknRVWcq2D9qDhXZiJTnEv/IKJIhAU9THRyFx4BHhOJGTrno4kEXQghYoIEXQghYoIEXQghYkIMBF3FubLz0YdPX/0EnIKp4lzpRr3fPs55qzhXZImuoIdeyyXok0m1XAbmIiD7HcIT1Voufv2j1o3hSBJdQRdCCNENCXo2aLRSgKg4V7CoD6KIBF0IIWKCBF0IIWJC9AVdxbkG4SMkPyrOFbztoirOlUMoRUL0BV0Igea8BURa0FWcq2D9hPU5AkHFubrb88WYj7ZEX0RY0IUQQqQiQc8KjTAKD1VbDBYf+0D9GRoSdCGEiAkxEPQY1HLp5jakrJ1Qasaolkt3c0HWcvER1XKJLDEQdCGEpgUFRFnQVZyrgP1EOJNGxbm62/PHmI+2RF9EV9CFEEJ0Q4KeDbpLX4CoOFewKMslikjQhRAiJkjQhRAiJkRf0KNenCu0NDwV58o/AfRNpx0/pzUKtThXlI51foiwoIdcyyVwN2HVciGkDJSIZrl0OQjIbNC1XArRnubQwyLCgh4yGh0UCVE/zlGPX+SCBF0IIWKCBD0rdMlYeKg4V7AobTGKSNCFECImZCXoZrbUzF40s81mdkuGNu8ys41m9ryZ/cjfMPsi4sW5ehTNCstPGNlBUS/O1Zu/ArHVZTQAkyrOFVVK+2tgZiXAXcCFwE5gjZmtcs5tTGkzE/gUcKZzbr+ZjQ0qYCGEEL2TzQh9EbDZObfVOdcM3AssT2vzt8Bdzrn9AM65Pf6G2QuhF+cK3BEqzjUQFxG2H2RxroK0pzn0sMhG0OuAHSnrO71tqcwCZpnZn8zsCTNb2pshM7vWzNaa2dqGhobBRZw3dLlXFEQ9PTXq8Yuc8OumaCkwEzgXuAL4lpmNSG/knLvbObfQObewtrbWJ9choLv0BYiKcwWLslyiSDaCXg9MSlmf6G1LZSewyjnX4pzbBrxEUuCFEEKERDaCvgaYaWbTzKwcWAmsSmvzAMnROWY2huQUzFYf48xM1Gu55KPGSqDErJaLr/aDrOXiJ6rlElX6FXTnXCtwA/AgsAm4zzn3vJndbmaXes0eBPaa2Ubgd8AnnXN7gwpaCJGOpjVEFmmLAM651cDqtG23piw74CbvJyRUnKtg/ag4Vx9mo1Kcq2CNiT7Qk6LZoss9EQl0nhYzEnQhhIgJEvSs0CVj4RGxB38ih9IWo4gEXQghYkIMBD3iaX55KZoVlp8YFOfy036gfePjKDiIlEoV5wqF6Ap6LGu5hOUnpGwa2e/NcDDdr1ougigLuhBCiG5I0LNGl3tFQdTTU6Mev8gJCXo26C59kaHjrSyXaCJBF0KImBB9QVdxrsL1o+Jc6cb8t6viXCKF6Au6EAJNEwmItKDHLJ3QUHGuAfsI1EFAZqNSnEtpi1Ekq2qLAl3uFQ1RP85Rj99fnty6l+8/vr3gvr5XLJrMObP8f2ubBF0IEVse2LCLB59/nRm11fkOpRsHG1sCsStBzwpdMhYeKs4VLHFJW3SMri7noY8tyWMM4RHhOfQOVMulcP2olkt3U0H0TdB97EN2Ss4MvpZLoU21BE0MBF0IoavIzBTTBVd0BT2WxbnC8BWGn7AyaYK0H6DhIGJXlkuvOAdWRP/soivoQgjRDw6nEboQRUvUJ12jHr/PFFt3SNCFELGmiAboEvSsKKZrtsgQ5DHR8fY3bdE/UwPFAVZE39/oC3pgWYsqzpWznzD6UMW5vAU/RSuI4lw52OrMWlTaYn9EWNDjVsslhMyQTj8h+FAtlwxmVcslTFyRlULQk6L90NTWRktTK0Mqyti25zDtDmbU1pAwaGptJ2FGWYnhHBxtbuWBp+s5Z1Ytk0dV0dLmKEkY7c5RVpLAOUdru2PHvmOMrqlgWGVp6Kf6c/UHmdHSzrZdB/nQ//k9lWUlXL9keue+5fPquH/dTupGDOH+dTtpbGnj/66cx6bdhzm5bhj/8MBzfO4dc6gqL+VgYwu7Dzbys/X1XHvOdMYNq2SGcxw73sqruw5RV9HCNd94jDkThrF07nh27m9kc8MRVsyv40+b36CirIR7n3qV0TUVfOJts9i85wijqsv56fp6rjtnOpVlCfYdbeHlPYd5Yus+PnrBTOyNI0wPuc/8xpGUuPZ2RyLR+xnQ3p6UopIM+9M52NjCcGDf0WZKjrUwvKqsu0/n2NJwhBFV5YysKu/X7msHjzMe+N9ndlFTuYcRVeUcONbM0MpSGg438cTWfYyuLufUqSMZN6ySg40t1FSU0trmaDjSxIuvHWLFSdX4X61k4BTRjIsEPROHjrcwDPiPR7ayeP9BHMbKf33Udz/ltPBSJdy7ZgcrJ/puvgf/tHoT3yxr48lt+9jWehSAG+/d0Ln/W3/Y1uNvVvz7Y93W3/mNx3u0+cPLbwDwQkU7P3zyVWbbcVqOHGVt837WvrKfHzz+Smfbux/d2uPvH32podv6L/68q0ebX/x5F5cmXubfyruOj/8EO6J7escBSo4fpMEZV396dc72Zlg9D1fAwy/s4fISeP93n+IZtzcnm4+UH2MvpYxPwG827eGB59cMys7XVx/hmUo43NTK0JwiygFXXIIe4SmXYFmzfT8AWxuOhuLv1b3HQvETFw4cC6a4UZC0tLVzOKCiTIVMW3v+pj2Ka8JFgp6RitKSzmUXg6yH+gON+Q7BV5wLsjhXMGY37T4cjOEA8POcz/f3p5ieFI3clEtjcxvP7zrIoy/u5ibgXx56ga/96pe++5lqu/l9he9me2ApY4jGljZK29oodY6Nuw8xZXQ1NRWlNLW2UZZI0NruON7aRmnCONrUxujqcvYebWb/sWaGlJUwblgljS1tmMGwyjKcS85njh1aydcefokvBv9xun11gvkaJftrz+HjTAGOt7SxZddBEmaMrinnxdcOM3v8UA41ttLW7mg43MSIqjLG1FTw2xf2MGFEJdPGVHO8pZ0RVWWMqCrjtYPH2X+shdHV5SQONHKCS14xTQX2HW3iZ8/sZMX8OvYebaat3VFemmDXgUYM4/DxFhZPH80vn91NZWmC+ZNHUlGaoKI0weiaCjbvOcLxljZGVZfT1t6e0jeFO3ZMjS2XOHscf5e8L9DSlrTZ1NpG/YFGpo+pYUvDEaaNqaayLDmQajjcRGnCqChLUOnaSQDNbW3c+F/reHjTHmqHVvQYpJxSN5zL5tcxZVQVH/7x0zS2tA069qgSOUF/33ef4qlt+yihjZsq8x2Nv/z2hT3MsmNc+Knc51bTOT1xDMp9N5s3/vOP2zitHG66bwOr2/07ET5X+hqXl7Rw39od3FwGp//zwzRTxj/+clPOtn8eYP8X7r+HLqZ9+pe4QUwKPFPRyjBLpiD+6rnXgN6vOJ+tP8iz9QdzjjPKZNW7ZrbUzF40s81mdksf7d5pZs7MFvoXYneWzh0PhHcZF142uoXymcLw4wjj+ETTfnD9769Nf6dcRFj0K+hmVgLcBVwMzAGuMLM5vbQbCtwIPOl3kKkMKS/pv5EQQhQh2YzQFwGbnXNbnXPNwL3A8l7a3QF8CTjuY3w9KCvJz33cQp7zFKIDnafFTTbqWAfsSFnf6W3rxMwWAJOcc33enTSza81srZmtbWho6KtpRspKiueOtRBCDISch7tmlgD+Ffh4f22dc3c75xY65xbW1g7uGbL0EXpQ8p5qN9AUuR5+gxlhpdsNbiTnT4ZEJjpsBlHBpIPU+WM/P4Nf2SOZbPqFkdq/uWS5dBwr67Q72HhyjaVYyEbQ64FJKesTvW0dDAVOBn5vZtuB04FVQd0YzdeUixCFTL5zvUVhkI06rgFmmtk0MysHVgKrOnY65w4658Y456Y656YCTwCXOufWBhFwx5RLmNkn4fiBMIoYORdGlks4PqJo3wU0zvS7qmCcHiwqJvoVdOdcK3AD8CCwCbjPOfe8md1uZpcGHWA65RqhCyFEr2T1YJFzbjWwOm3brRnanpt7WJkpK5Wgi+CI+jytxsLFTeTUMdtyokIIUWxETtDzIefRHrPFkyDnZTXnqydFo0r0BD2tuHFYaX5BEV46Yd9+g7AbVEpdJn+FT2rf+EMgRQTMnzj9OrcttJSB6BM9Qc93AEIUILqqEBBBQT98vBUI8wQOK20xnLGmI/hL4LB8RNF+UGPNwhb0Qo4tXkRO0NuK7TXeeWbF/Lr+G+XI7HF5e0FZUVE3YohvtiaNGsIdy+f6Zk/4Q+QEPV9JLgMZP5fGKBNn4sj+ReCmC2fl5GPuhL7fDvrBc2fkZH8gRGtOvid9xf/wx5fw7G1vy7h/3qQRWfs5sbaGK98ylec+f1Gv+x/95FuztiX8I3KCHoUB+i8/cjaffXuPCsOd3H3lqfzk2tNDjKiLSaOqsm47c2wN1y+ZwfeuOo37r38LAFVp5Ytvffscrj5rGjPH1vBXpybfcn3mjDFZ+5gwvJLbLzuZ25fP5R8vOxmAmoruj0e85/QpfPXd8zhpfPGN5IdW9v6oyJOfPp/tX7xkQLYqy0oYWlnGeSeN7bHvLdNH88CHzuzR95nosNFb+3cumMjk0VVcviD4qzvRnci9sag9TdFT78j7ScdI55xZY5h6qIZhZY47F/4FTW3tNBxuorm1ndOnj+K7f9rOre+Yw/X3rOPb7zuNEdVlDKssY/b4oZw7u5Z12/ezZHYt3/7jNq45exr3r9vJhXPGYWb85qYljK1sTZY2C5iOz7NiQR0j15bz3jmTec+yi2ltd7S0tTOkrITjre1UlCY6X+rb8Tqwc2cnv7wv3LGUyrISjre0UVGaoMlrb2b8+qYlSUebjHmTRtBaNpbEkTZevGYprW2O1jZHZXmCxuY2hpSX0NLmSBhUlSdPwb95y1Scc6yYX0d1RSlHm1qpKi/hWHMb1RWl1M2v47L5ddz37SdhR2rBJ/+Pf6pFf4tzZW/3h9csZviQMsYOrWDRFx5m3LAKXj/U1Ll/3LDkW5quOWsqrIEzZoyGV+DK06fwuQVn8vH7NnD2zFo27znChh0H+MLlp3T+7df/ej4fvXcD15w9nfoDx3iu/hDXL0leBT32qfMY8vUKxo8cBTvgjuVz+acFS9m+9yjlJQlGVJWz+2Ajc07ouqpa/9kLGVJWwuGmFgxjZFUZAJ+9ZA5szP1YdWW5RGA0l2ciL+hBU2LGhOGV0NbCu06b1GN/h9h1CloKM2prmFFbA8Cnl70JgA+ee2Ln/hPH1kDz0SDCzoh5PyUJg5IEpSVdwl3jlVUoy/AOkY526b97o2PaqaK0hNRBXMfLt3sbCJoZ1d6O9N9dRP9LPW/SCKgcwfYrsxth9zUSf/fCSbDGu/J6BS5fMBEmjeDhj5+b8W+qyku5+286aueNYsX8rn3DKsugJAGJhNe2BMpKOGl8l4CPqu7+Lr2Odb18Jv9Ebsql6z27IWWfWFhZLmFlKhgE/Zms49+G/3Qc/ygX5wqkb/w+pj7ai/6/4OgQPUGPwiS6CIz2/psIUbREUNDzHYHIJ07/0IXISOQEPW9faAlJURD9hFOdp8VM9AQ9bT2cL2CAXtKzdsKqTRPUP6hudgPIQHHxeAWdf30TwHH06xh2Hiu9gi4sIifoF7xpHO9ZPHlQfzumprzP/c/c9jauPmtat219ZXKI8Kkuj1xiVkhE/9pisGz/4iVMGd3z+YoxNRV5iCa/RE7Qy0sT/NOKU/jqu+f1un/SqJ5PNt769jn84ea38oebz+O5z1/EladP6ba/4/H26vJSPvG22dz69jl86uLZAFw4Z5zPn6B3Tps2OpQsl5KSBMF/+YPLpLnkL07o8jFATp0yMuu22Y4FOx6nX9lLSivAms9cwF+nDEBcUH1T0FkufduqGzGEi+bm9j37+6UndVufNa6GW9+RfLjvjBmjc7IdJSI73Llsfh38D9x4/kxufOvAnpi747KTucN7KrGDr3j/IIaUl/CBs6ZBQzP8NryXUp80fijrX8mu7XVLprN07nimjK7mwLFmfrp+J3f9bkvn/iWzannkpQa2fGEZ+48189Dzr1P+yj7YCIumjoKnAvoQITDQVxB+/wOLOGfmGMyMg40tvPnzD/Vos+2fl7FjXyPnfPl3gPd05pH+bX/l3W/mwjnjqakopa3dce+aHT3ajKkp5wsrTuFHT74KwOnTR0HboQF9hjjz3atO462zx7Jj3zEefP71bvu2f/ESHni6Hvuf5HoiYTx729tY/+oByksStDvHyROGA7DslBN4+ONLKC9JPhg3blglQ8pLuPTNE8L+SHklsoIeNyaOrKJpwjCmNFex/cbkP6ijTa08sXUv578pOXppaWunNGHdasKPqi7nkxed1CnoX3n3m1kxf2Ln/jE1FckR4piJsBESIeXVh8V1S6az8sTFPL5lL1/77ebO7b09jDN8SFmvNsyMyaOruPH8mfBIz3/i/3HlqVx3z7rO9cvmTeCrK+d3a1OSMN5/xlS+99j2HrZTqS4vhcasPlqsST8+k0ZVMXHkEHbub+y2/7L5dbC6BFqS12RDK8tYMqu2V5sdD/EVMxL0rAn2hoyRLHjEri4BqK4o7RRz6PtqYaB1PaLORXPHw8swd8JwmDGGN08cwdGmNm5eOrvP+x5//Pu38u+/38LYoRU0NrcxK6XS48cunAVNk2Hjes4/aRxsgY2fvwirqOFf/urNPLZlLz9dv5NpY3oXjlvfPoeLTx7Prze+zn/+cZvvnzkrCjwb66d/95aMx+f9Z0zlH3+5iV9/7JyQo4oPEnQRSa5YNBle7lqvrijtnDPti4kjq/jCilP6bbd4+ijY0jXCfuepE5M/C+pYNG1Ur3+TSBiLp49m8fTR/Ocft3UrV/vYLecx6ofFV1wslXWfvYBh1dUZ91991jQ+cOY0EjGqVho20Rf0MNLvAp2mSI8/2FcrhOonEBdpCYuBHP+U451m/4wTs6sk+adbzutWKXHCiCFQmtnuoAnk8zv86d/uKabD+qnkaGa9f9U6Yijwq49CIPqCLkQB4ufLJLKjcEe1QytKoTnfURQHkUtb7E5IxazCwEIomhWWnwCLc3X5CNZ8YA6C6psCTlscWtn38x/CPyIu6EIIITqQoAshREyQoGeLbsgUCVE/zlGPX+RCDAQ9jKyQ8IpzhVM0Kyw/QRaO6jgmAfjoNn/so/1A+iaoPvahfzs+r+WaMePSfotMxEDQhRCFnOUiwiPagh5WVkgohJAZEpqfsDJpomi/+Ipz6Z9NeGQl6Ga21MxeNLPNZnZLL/tvMrONZvaMmT1sZlN6syOEECI4+hV0MysB7gIuBuYAV5hZ+jPWTwMLnXN/AdwP3Ol3oEIIIfommxH6ImCzc26rc64ZuBdYntrAOfc759wxb/UJYCKxQzdkioKoZzNFPX6RE9kIeh2QWuh5p7ctE1cDv+pth5lda2ZrzWxtQ0ND9lEKIYToF19viprZe4GFwJd72++cu9s5t9A5t7C2tveaxgNGxbkK109ghaMgX8W5ciOAvumRxumLUfzp3460xbT1AZvRVUe2ZFOcqx5Ifb/WRG9bN8zsAuAzwBLnXJM/4fWHarkUpJ9YZB+plouPxny0JfoimxH6GmCmmU0zs3JgJbAqtYGZzQf+A7jUObfH/zCFEEL0R7+C7pxrBW4AHgQ2Afc55543s9vN7FKv2ZeBGuC/zWyDma3KYE4IIURAZFUP3Tm3Glidtu3WlOULfI5LCCHEAIn2k6JhohszRULUj3PU4xe5EANBDysrJCg3YRXN6sdvKE79MBlWtk4A9l3GFb+M+mTSpyJivp3bQWdOxYcYCHoY6C594RHgMQmtfo+f+B2zj/Yi2Z/RJNqCHov0uE5HqDjXQFxE1b6Kc4ngiLagCyGE6ESCLoQQMUGCnjW6GVMURP2mW9TjFzkRfUEP65VtgRGjWi6hZOyElBUUiP0ga7n4iV9x+nTOOWW5ZEv0BT0MdJe+AAnymETxeCvLRURe0FWcqyD9BFWAqoePQB0EZFbFuURwRFzQhRBCdCBBF0KImCBBF0KImCBBzxbdXS8Son6cox6/yIUYCHrcinOF5SeihbN6vHItSsW5Qnglny8mU15B50dxro4brH4U59I/rD6JtqCHVsslLjVWwvIT8VouQWcCqZaLCIhoC7oQQohOJOhCCBETJOhCCBETJOhCCBEToi/ooRXnylexrqDsRrVwVsA+As0GCqE4ly92VZwrqkRf0MNAxYUKEBXnChYV54oiERd0FecauB9UnCs7BwGZVXEuERwRF3QhhBAdSNCFECImSNCFECImxEDQQ8oKCSubJjQ/IfgI4zV3vvsIMIuiW+x++QmgP/yK07dzW7VcsiUGgi6KE91o604B94eyXEIj2oKu4lwF6kfFufowruJcebMRf6It6EIIITqRoAshREzIStDNbKmZvWhmm83sll72V5jZT7z9T5rZVL8DFUII0Tf9CrqZlQB3ARcDc4ArzGxOWrOrgf3OuROBrwBf8jtQIYQQfVOaRZtFwGbn3FYAM7sXWA5sTGmzHLjNW74f+LqZmXMhVNJZ/wN46UH/7anygloAAAerSURBVLY0dl/fswnuWuy/n/bW7uuHdwfjp+lI9/VNv4D6tf76SD/c+7b5/1ka93sL3k2yh++Ax77mn/1Du6Gipmv92xdCIpuvSRbs2wpjZiaX69f70zed56nXHw98EMqrc7PZuK/L3iN3wlPfGpydtmYvNM/Wt86DRMnA7bh2sETy/PrGmfHImllyM5z8Tt/NZnOm1gE7UtZ3AulnYmcb51yrmR0ERgNvpDYys2uBawEmT548yJBTOOcT8NqzudvJxJQzYOIiwKCsMjg/J8yDE8+HsW+C5iMElmtbeV7Sx5kfgVf+FIyPcXPhpGVQNz/5RQyCoSfAhPmw6Fo48rq/tmtnw+Qz4MQL4OS/hPYWf23PvxKaDkPVKP/sTjkTFl4Fx95I2s6V2pNg8XUwfCIcqs/NVt1CmHcFPP3Dwffl2Dkwaym8/GDPAVBUqRwRiFnrbxBtZn8JLHXOXeOtXwksds7dkNLmOa/NTm99i9fmjd5sAixcuNCtXevzCFEIIWKOma1zzi3sbV82N0XrgUkp6xO9bb22MbNSYDiwd+ChCiGEGCzZCPoaYKaZTTOzcmAlsCqtzSrgfd7yXwK/DWX+XAghRCf9zqF7c+I3AA8CJcB3nHPPm9ntwFrn3Crg28A9ZrYZ2EdS9IUQQoRIVrfvnXOrgdVp225NWT4O/JW/oQkhhBgIelJUCCFiggRdCCFiggRdCCFiggRdCCFiQr8PFgXm2KwBeGWQfz6GtKdQCwTFNTAKMa5CjAkU10CJc1xTnHO1ve3Im6DngpmtzfSkVD5RXAOjEOMqxJhAcQ2UYo1LUy5CCBETJOhCCBEToirod+c7gAworoFRiHEVYkyguAZKUcYVyTl0IYQQPYnqCF0IIUQaEnQhhIgJkRP0/l5Y7bOvSWb2OzPbaGbPm9mN3vbbzKzezDZ4P8tS/uZTXmwvmtlFQcVtZtvN7FnP/1pv2ygz+7WZvez9HultNzP7N8/3M2a2IMXO+7z2L5vZ+zL5yzKm2Sl9ssHMDpnZR/PRX2b2HTPb4718pWObb/1jZqd6/b/Z+9us3ouWIa4vm9kLnu+fm9kIb/tUM2tM6bdv9uc/02ccREy+HTNLlt5+0tv+E0uW4R5sX/0kJabtZrYhzL7y/i6TLuT9/MI5F5kfkuV7twDTgXLgz8CcAP2dACzwlocCL5F8UfZtwCd6aT/Hi6kCmObFWhJE3MB2YEzatjuBW7zlW4AvecvLgF+RfFHk6cCT3vZRwFbv90hveaSPx+o1YEo++gs4B1gAPBdE/wBPeW3N+9uLc4jrbUCpt/yllLimprZLs9Or/0yfcRAx+XbMgPuAld7yN4G/G2xfpe3/F+DWMPvKa5tJF/J+fkVthN75wmrnXDPQ8cLqQHDO7XbOrfeWDwObSL4/NRPLgXudc03OuW3AZi/msOJeDnzfW/4+cFnK9h+4JE8AI8zsBOAi4NfOuX3Ouf3Ar4GlPsVyPrDFOdfX08CB9Zdz7lGStfnT/eXcP96+Yc65J1zy2/eDFFsDjss595BzruNlmU+QfCtYRvrxn+kzDiimPhjQMfNGlueRfHl81jH1F5dn913Aj/uy4XdfeXFl0oW8n19RE/TeXljdl8D6hplNBeYDT3qbbvAun76TcqmWKb4g4nbAQ2a2zpIv3wYY55zb7S2/BozLQ1wdrKT7ly3f/QX+9U+dt+x3fAAfIDki62CamT1tZo+Y2dkp8Wbyn+kzDgY/jtlo4EDKPyy/+ups4HXn3Msp20LvqzRdyPv5FTVBzwtmVgP8FPioc+4Q8A1gBjAP2E3y0i9sznLOLQAuBj5kZuek7vT+s+clJ9WbI70U+G9vUyH0Vzfy2T+ZMLPPAK3AD71Nu4HJzrn5wE3Aj8xsWLb2cvyMBXfM0riC7gOG0PuqF13IyZ4fRE3Qs3lhta+YWRnJg/ZD59zPAJxzrzvn2pxz7cC3SF5u9hWf73E75+q933uAn3sxvO5drnVcau4JOy6Pi4H1zrnXvRjz3l8efvVPPd2nRXKOz8zeD7wdeI8nBnjTGnu95XUk56hn9eM/02ccED4es70kpxhK07YPGs/W5cBPUuINta9604U+7IV3fmUz0V4oPyRfmbeV5M2YjhsvcwP0ZyTnr76atv2ElOWPkZxTBJhL9xtGW0neLPI1bqAaGJqy/BjJue8v0/2mzJ3e8iV0vynzlOu6KbON5A2Zkd7yKB/67V7gqnz3F2k3yvzsH3retFqWQ1xLgY1AbVq7WqDEW55O8kvdp/9Mn3EQMfl2zEheqaXeFP3gYPsqpb8eyWNfZdKFvJ9fgQhhkD8k7xi/RPI/8GcC9nUWycumZ4AN3s8y4B7gWW/7qrST/zNebC+Scmfaz7i9E/bP3s/zHfZIzlc+DLwM/Cbl5DDgLs/3s8DCFFsfIHljazMpIpxDbNUkR2XDU7aF3l8kL8d3Ay0k5yCv9rN/gIXAc97ffB3vqetBxrWZ5Fxqxzn2Ta/tO73juwFYD7yjP/+ZPuMgYvLtmHnn61Pe5/xvoGKwfeVt/x5wfVrbUPqqH13I+/mlR/+FECImRG0OXQghRAYk6EIIERMk6EIIERMk6EIIERMk6EIIERMk6EIIERMk6EIIERP+P6+5fzKtKIeQAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.54] \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 7, training avg cost 0.538854\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deXgc5ZWv39Otzfsqb7K8gDdsFtsIm2uz2AaClwQDyQAOmbAlMCEEEkh4nJAQLgkzAW5myQwDCZONTAhh5obEc3EwCUtICGDMjjEGY7PYGLzg3bK1ffePLkmtllpqqatKXaXf+zx6VFX99Tmnv/r611VfnTplzjmEEEJEn0R3ByCEEMIfJOhCCBETJOhCCBETJOhCCBETJOhCCBETirrL8dChQ924ceO6y70QQkSS5557bodzrryt17pN0MeNG8eaNWu6y70QQkQSM3sn22uachFCiJggQRdCiJggQRdCiJggQRdCiJggQRdCiJjQoaCb2U/MbJuZvZrldTOzH5jZBjN72cxm+h+mEEKIjsjlCP1nwMJ2Xl8ETPT+LgfuzD8sIYQQnaXDPHTn3BNmNq6dJkuBe1yqDu/TZjbQzEY657b6FGNL3nkK3no0tZwshuMvgb5t5tjnz3M/hz2bU8tTlsCo6cH4efU3sG1davnI+TB2jv8+GhrgmbugehckkjD9QhhY6b+f11fC+y+kliedCaOr/LVfW536HDUHoaQPzP47KC7zz/6ezfDCf0JDPQw7Co4+1z/bm5+DNx5KLU87G4ZP88fump/C3vehz1CYdTmY5Wdv059h0xNQ0htmXZH631Vevh92vAkjjoapS7tm442HYfOzqe/flCVdj6UH4MeNRRXAe2nrm71trQTdzC4ndRTPmDFjuuZt82p44nbAq+PeewiccFnXbLVH9W74n6ub17e/Duf/wn8/ACu+BDX7U8vvPAmXrPTfx471sOrrzeuWgFOv99/Pyq/C3i2p5fdfgM/8t7/2330K/nhT83rF8TD+ZP/sv3gvPP4PqeWSfv4K+hO3NQv6ns1wjg8nswd2wv/7cvP6pDNh0Lj8bP7x27DludTy8GNg4uldt/XAFeAaoNegrgv676+HXZug3ygJegeEelHUOfcj51yVc66qvLyLR9Vzr4GbdsPX3vKMNvgXYDqNdhfdBsOPDs4PpI4G51wN409NLQflA+C8X7RcD8LPzItg9AngAvDR4O2HM76T+u+3j8Z+mfOlYGyPmpkSXL9sN9qpnN3sI18a6qHPsJb2u0rj96Yhj+9PYwxBjKeY4YegbwHSz91He9uEEEKEiB+CvgL4rJftciKwJ7D5cyGEEFnpcA7dzH4FzAOGmtlm4NtAMYBz7i5gJbAY2AAcBC4JKlghhBDZySXLZVkHrzvgi75FJIQQoktE/05R58KxG5SflPEsy0H5CMlPIH0W9H4JMv4AbAfdx/nYdz6N68a3BvodjAfRF3TRM8k31zp2+Nwf6t9IEmFBD2vAWTi+zML5EoXmI2g/EbUfVN/4vl8l6FEkwoIuhBAiHQm6EELEBAm6EELEhBgIelhZIQHigs4MacNuKH4C8NEq7rCyj/wwF0TfBN3HPmW55DXeXMZ/kY0YCHpYaDD1CCKbGuddxIxs/MIPoivoYaVVhZa+FVI2TSh+QsjYiaz9gPrGb5tKW4wk0RX0MNHYLkCC3Cna4eqDaCJBF0KImCBBF0KImCBBF0KImBB9QY96ml/KeJbloHyE5EfFuYK3HUgap9/phpnLnTXjfIilZxB9QRc9E2VhZKAsFyFBz5GYFecKJeUvosWzINj4VZxLBIgEXQghYoIEXQghYoIEXQghYkIMBF3FuTrtIzQ/Ks7V0lxUinP5ZF/FuUInBoIeFhpMPYKop8ZFPX6RF9EVdBXnKmA/Ks7VjmEV5xKBEV1BDxMN7gJExbmCRX0QRSToQggREyToQggREyToQggRE6Iv6GGl+QWKinN1yX4bq/66ilJxLj/nvFWcK6pEWNDDzD4h+MGkWi5d8BEEQQhkGoHXcvFpnOqaaCSJsKCHiUZ3j0JZTWjMRxMJuhBCxAQJuhBCxIScBN3MFprZejPbYGbL23h9jJk9ZmYvmNnLZrbY/1CFEEK0R4eCbmZJ4A5gETAVWGZmUzOafRO43zk3A7gA+He/A82OinN12keYfgK3r+JcvuNXnCrOFTq5HKHPAjY45zY652qA+4ClGW0c0N9bHgC871+IhYIGU48gsqlxIWVjiYImF0GvAN5LW9/sbUvnJuAzZrYZWAl8qS1DZna5ma0xszXbt2/vQrgtjOX3/kLzo+JcnXQRVfsqziWCw6+LosuAnznnRgOLgV+YWSvbzrkfOeeqnHNV5eXlPrkOAQ3uAkTFuYJFfRBFchH0LUBl2vpob1s6lwH3AzjnngLKgKF+BCiEECI3chH0Z4GJZjbezEpIXfRckdHmXeA0ADM7ipSg5zmnIoQQojN0KOjOuTrgKmAVsI5UNstaM7vZzM7yml0HfN7MXgJ+BVzsXEhXZ6KardHSWZbloHyE5CeUWi4B7qdI1XLxE9VyiSpFuTRyzq0kdbEzfduNacuvAXP9DU0IkTua8xaRvlNUxbkK1o+Kc2UnMsW59AMRRSIs6GGiwV14BLhPJGZozEcTCboQQsQECboQQsQECboQQsSEGAi6inPl5qMdn776CTgFU8W5Mo16/32c81ZxrsgSXUEPvZZL0INJtVw65yIg+43CE9VaLn79UOvCcCSJrqALIYRogQQ9F3S0UoCoOFewqA+iiARdCCFiggRdCCFiQvQFXcW5uuAjJD8qzhW87R5VnCuPUHoI0Rd0IQSa8xYQaUFXca6C9RPW5wgEFedqac8XYz7aEu0RYUEXQgiRjgQ9J3SEUXio2mKw+NgH6s/QkKALIURMiIGgx6CWSwu3IWXthFIzRrVcWpoLspaLj6iWS2SJgaALITQtKCDKgq7iXAXsJ8KZNCrO1dKeP8Z8tCXaI7qCLoQQogUS9FzQVfoCRMW5gkVZLlFEgi6EEDFBgi6EEDEh+oIe9eJcoaXhqThX9xNA3zTZ8XNao1CLc0VpX3cPERb0kGu5BO4mrFouhJSBEtEsl2YHAZkNupZLIdrTHHpYRFjQQ0ZHBz2EqO/nqMcv8kGCLoQQMUGCnhM6ZSw8VJwrWJS2GEUk6EIIERNyEnQzW2hm681sg5ktz9LmPDN7zczWmtm9/obZHhEvztWqaFZYfsLIDop6ca62/BWIrWajAZhUca6oUtRRAzNLAncAZwCbgWfNbIVz7rW0NhOBrwNznXO7zGxYUAELIYRom1yO0GcBG5xzG51zNcB9wNKMNp8H7nDO7QJwzm3zN8w2CL04V+COUHGuzriIsP0gi3MVpD3NoYdFLoJeAbyXtr7Z25bOJGCSmT1pZk+b2cK2DJnZ5Wa2xszWbN++vWsRdxs63esRRD09Nerxi7zw66JoETARmAcsA+42s4GZjZxzP3LOVTnnqsrLy31yHQK6Sl+AqDhXsCjLJYrkIuhbgMq09dHetnQ2Ayucc7XOuU3AG6QEXgghREjkIujPAhPNbLyZlQAXACsy2vyW1NE5ZjaU1BTMRh/jzE7Ua7l0R42VQIlZLRdf7QdZy8VPVMslqnQo6M65OuAqYBWwDrjfObfWzG42s7O8ZquAnWb2GvAY8DXn3M6gghZCZKJpDZFD2iKAc24lsDJj241pyw641vsLCRXnKlg/Ks7VjtmoFOcqWGOiHXSnaK7odE9EAo3TnowEXQghYoIEPSd0ylh4ROzGn8ihtMUoIkEXQoiYEANBj3iaX7cUzQrLTwyKc/lpP9C+8fEoOIiUShXnCoXoCnosa7mE5SekbBrZb8twMN2vWi6CKAu6EEKIFkjQc0anez2CqKenRj1+kRcS9FzQVfoehva3slyiiQRdCCFiQvQFXcW5CtePinNlGvPfropziTSiL+hCCDRNJCDSgh6zdEJDxbk67SNQBwGZjUpxLqUtRpGcqi0KdLrXY4j6fo56/P7y2OvbuH/Nex03DJlls8ZwyiT/n9omQRdCxJZfP/seD639gEnD+3Z3KC3YU10biF0Jek7olLHwUHGuYIlP2uLk4f1Y9ZVTujWGsIjwHHojquVSuH5Uy6WlqSD6Jug+9iE7JW9UyyVXYiDoQgidRQqIsqDHsjhXGL7C8BNWJk2Q9gM0HETsynIRRFnQhRBCtECCLkQ6UU9PjXr8Ii8k6EIIERMk6LmgNLYCJMh9ov3tb9qif6ZE+0Rf0APLWlRxrrz9hNGHKs7lLfipmkEU58rDVlPWoqaTOiLCgh63Wi4hZIY0+QnBh2q5ZDGrWi4iOHSnaAccrq+n9nAdfUqLeeiVrVQO7s2Bw3VMGNaXwX1KMDMaGhy1DQ0UJxIkEkZ9gyNhsGV3NQkz+pQWMaBXMQB19Q18sPcQ7+8+RNXYQZhzoQ73V7fsYUJtA29/sJeFyx/k07PHcO8z73L9wsnMmzSM3Qdr+J+XtzJ6UC8unjOOXQdrOHC4nofXfsBfNuygurae5YumMKRPKU9v3EnC4M7H3+L7502nYmAvRtY3UF1dx+ate6ksq+OY5Q9yzowKHnhhC5edNJ4LTqhkx/4annhzOyP6l/E3VaPZub+Gw3UN/PTJTdTWN7CnupYr501gSN8SnnhjBx/uPcRz7+zi++cdx/vv7WJGiP0VBI3HmQ0NzUecyUTnRkHjGGt8157qWgYAL23ZzeBeB6lvcIwb2iflzzuyNU+kD9XW8/7uasYP7dO0bf/hOjbvOsiUEf0B+GDPIUYA3/ztq+xbN4l1W/fyk4tPYOP2A+w7VEfVuEEUJYw91bXc8MCrfPXMSVQM7M0bH+7jre37KUoYiyb0Yqj3eSXp4SBBz8LeQ7X0B374p43M3rUHh/GF9c/77qeEWt4og7++tZM5vX0334pbVq7jruJ6nnprJwD3PvMuALc9tJ7bHlrfou3tq9a3ej/Ap+9+ptW2ZXc/DcDrpY57V7/LZDtEzb79ADzwwhYAfvyXTfz4L5tavO/bK9a26WPV2g9bbZv9949wVuJtZpRATX0DJVk/ZT4Ee1r/wnu7SR7aw3ZnXPaNlXnbO9K28EgpPPL6Ns5NwrceeJWXXXVeNv9UcpCdFDEikRL63734PgAn3fpY1vd88s6nWm27nf28XJb6AemVV0QiVyI85RIsz769C4CN2w+E4u8vb+4IxU9cqKuP3nxqbX0D+wIqylTI1HTjvnI9rFyABD0LpUXJpuVwJ0WCYdveQ90dgq80BPk9DWh3r9u6LxjDAeDnmO9uSe1JSWqRm3Kprqln7ft7eGL9Vq4Fvv/w6/zr7x9s9z1D+5awY38NAP3Kith3qK7d9n1KkpTXbubxUr+izo6lDffq2nrKvDn1mroGipPGjv019CpJ0re0iF0HauhbVkRRwjAz6uobMLOm+ddtew9RnEwwsHdqvt7MqK1v4MDhOh57fRvnB/9xWshAMN+jVH99uPcQfYGa+no2b99Pr5IkzsHBmnqG9S/l3Z0HqRzcm9KiBHura9l7qI5RA8uoa3Bs3X2I2voGJgzry97qWrbtO8zw/mUM6l3M/oOpuejNuw5SCew6cJiNu/czfmhf3t9dTe+SJPUNjv69iilJJkiYsfdQLYP6lHC4tp5t+w6zYdt+TjtqGDv317BldzWThvejtChBfUNDWt90t8xlJz22fOJs3P+ucRLdOTDDOdc0d59OXX0DHx2soX9ZMWXFSe+9DmjAgMN19Uxenvquj+hfRuXgXlQOTs3blyQTPP/ubs6cNpwtu6t5Z8dBPjtnbJtTd3EmcoJ+0U9Xs3rTRySp59qy3N7TKOZAh2IOcKCmnvJu+FV/9PVtTLL9nLG8/R+ornBiYgvnBzPp3C3ctmo9PyyBL9/3IisbchwIOfDtos2cm6zlV6vf4/pimP0Pj1BDsS+2Hwiw/wv356GZ8d94ENeFSYGXS+von/F9/GDvIT7Ye6hparSRdAG/47G3uhRnlMmpd81soZmtN7MNZra8nXafNDNnZlX+hdiShdNGAOFNg4SXjW6hfKYw/DjC2D/RtB9c//trM05TLj2JDgXdzJLAHcAiYCqwzMymttGuH3AN0DoFwkd6lSQ7biSEED2QXI7QZwEbnHMbnXM1wH3A0jbafQe4FQj06ltxsnuu4xbynKcQjWic9mxyUccKIP0pq5u9bU2Y2Uyg0jnX7uSvmV1uZmvMbM327ds7HSxAcbIHXbIWQohOkPfhrpklgH8EruuorXPuR865KudcVXl51554nXmEHpS8p9t1LrwfkaCOsDLtBnck50+GRDYabQZRwaSR9PljPz+DX9kj2Wz6hZHev/lkuTTuK2uy29V48o2lp5CLoG8BKtPWR3vbGukHHA08bmZvAycCK4K6MNpdUy5CFDJxuFdC5E8u6vgsMNHMxptZCXABsKLxRefcHufcUOfcOOfcOOBp4Czn3JogAm6ccgkz+yQcPxBGxQvnwshyCcdHFO27gI4z/S5E6G+Wi35swqJDQXfO1QFXAauAdcD9zrm1ZnazmZ0VdICZlOgIXQgh2iSnG4uccyuBlRnbbszSdl7+YWWnuEiCLoIj6vO0Ohbu2UROHTtbZlQIIXoKkRP07pDzaB+zxZMg52U156s7RaNK9AQ9o6hPWGl+QRFeOmH7foOwG1RKXTZ/hU963/hDIEUEzJ84/RrbFlrKQPSJnqB3dwBCFCA6qxAQQUFvrJYY3gAOK20xnGNNR/CnwGH5iKL9oI41C1vQCzm2eBE5Qa/Xk7+FEKJNIifo3ZXkkuvx83GVAwOOJHge++o8TpowlJ9eckLTNr/z/++7/EROmVTOHZ+e2bTttCnDsra/ZO44X/1nI1pz8q0JI/4BvYoZ1q+U//hsFT+5uIr+ZZF7rEJsiZygF+oB+os3nsHqb5zG7744l3FDUk97fuFbZ2Rt/4V5R4YVWgvK+7V+DNMVpxzBkD7NT1+oHNSL//zcbOZPHsZL3/4YN31iKm/csoijK1JPhG/838jdn63i3s/NBlIXrZccM7KVjzOmDqdiYPOjgqeN6s89l85iybEj+fP187n1k8fw44ubf0A+OXN0i/dfvWAiU0b068InjhZfWjCBeZPLeerrC4D2f+RyZe6EITxw5Rw2/cNirl84mW8uOarNdiMH5PagkKe+voDVN5zO6VOHs2DKcJ5cvqDVvpk/uWu1mkR+RO6ntSFD0dOvyPtJ+pHOuPK+UFfDHafP5J2PDnD29ApG9C/j3x/fQH0DXHjiGAb2bhbEx782v2n5rs/MpLxfGTMqB7LjwGHe++gg6z/YzwUnVPKV0yfxwY6dcFcgH6HNz3PuzAoGvVTChVPHcMGihRQlEiQTxvULp1Bbn3pEWlHa0fiAXsVcPHc8AL+9ci4NDkq8m7saGhw19Q1Njwt7+3tL4GYYN6QPoycPI7GvnvWfX4hhlBQlqG9wHK6rB6B3SfPQqxzcm/MHjwFg/XcXkjSjKJng++cdx+G6+qb3P/TlU/jKN1L3tzUXfPJ//6db9Lc4V9t2L5w9hl8+8y4AnzlxLMP7p4T17e8tAWDB9x9n4/YD3PHpmfz9ynXMnTCERUePZOueQ/z8d6lCqCdPHAqb4IpTj+Ct0smcO7OCe595l007DnDbp45t6u8r502gvsFxsKaeM6eNSD2i71AtyYQxYVhfHl77IcP/UMrIwUPgXfjO2dP49rEfo7QowZ6DtZhZi30H0K+smIe+fErrD3zgCLg9/33VnOVSoEdzBUTkBT0MRg0og/oko45teeR51YKJHb534dHN7xnWr4xh/co4fuxgAEoSxpjBvf0NtgPM+ytKGEVpD8JOJoxkov2HhxRlTLskEkZZlvcUJQys5cO2k4nWYpBJevu21uPA9MqBUDaQt/92SdO2W845Jmv7R6+b17S8JGMMfnrcKXAnjPLOfhZOGwmVEwC47mOT27SXTBhXn9b22P3EcaPg0UTTk5V7FyehLPUIvmH947cv4kbkplyan7Mbt0fQhZOp4LDgH4NujT8bwRHl4lyB9I3f+9RHezquDo/oCXqhTqJHBPWeEPElgoIerr9xQ/uE6zBg9HsoRHyJnKC7kBWptLG6o5SwIGi8ABsU0b8FRuO0JxM9Qc9Yz/ULOGl43y75+8SxozrhpQtkZu0EXGPFNdoP6geqhV3/fUz29mPUH0HnX98EsB/92odOj6ALm8gJ+ulHDefC2WNybv+1Mydz92erePgrp+bUfnrGjUG9S+J1Zf+Yimjf+NRPN7FkIfrnFl3l0ety+273BCIn6CVFCW455xj++fzprV675Zyj+cGyGU3rf75+Pl+cP4Ezpg4H4PXvLGTDLYv45edmM7RvCfMnl/PHa0/htCnDmDAsdeR3/xX/i59ecgJTvCPB8v653WyRLyeMH5JTZsXcCUPy8jNyQC+C//IHl0nzieNGNfvwkcwbYzo6Fky/SaojRqXdsNOvrDiYvinoLJf2bf34os49fvjkiUObll+56WMcUd6XGz8+tUWb31w5hxljUgcvVy+Y0Cn7USayhztnz6iA38E1p03kmvlLWrx2VtOXviWN869zJwxlzTeb7+JMv0MRYP7kYcwffDzc4XPQ7TBlRD+ef6fltsXHjODaMyZx+j8+0bTtl587kUO19dz9xEYuOWk8G7fv56x/e7Lp9XNmVPDy5t28t6uamroG3vjuIv7jLxt5/cn1UBvWpwmO9koQrLz6ZBb/4M9N6y/d+DGOu/lhIHWTzrjlD2Z974NXn8w3fvMKR33YjwH7imFfy9f/6fzj+NXq91i96SMgJRiL/+XP7DxQw2Nfncf8//N4m3bPmVHBbZ86ls/fs4bhO8oYOHwg1OzO8dPGn78uX9CUQ5/Jzy+dxXX3v0iy3sCl7nv409fmMXZIH3bsP0xRwlI/kMClJ43n0pPGt3j/A1fODTz+QiOygh43Rg/qzeFR/Rlb05u3r2n5A9V4x2AjZcVJvuTdGHJMxQCuOW0i559QyepNH/HxY0e2ugHoynkToHIG/CLYz9Ad/GDZDG4/aiElyQRmxsRhfXlz236+sXgKA3oXZ33f4mNGUN63lJ8/9Q43LD6KZMK49VPHwoODYW3zEWXvkiTVNXDG1BHU1rkmQe9TWsQVpx7B3698nSF9S7K54cQjBlOcTPCzS2bBD0uhhz8TN2Hw2s0LmfKthzi6on9WMQc4Ymif1IHXLQmoTZ2TjR2Syjob2rd1CQsRwSmX7iPYCzIGTCjvS7KTp7pmxlfOmMSogb04e0ZFKzGPK2dOG9G0XFqUbHrwyYqrTuKFb53B5ae0rpVz7+dnc8Pio3j7e0v49wuPp7itvvIu5J02JTVN94NlMzh1Ujm9i5P0KU0d/0yvHEjf0iIuP+VI3v7eEvqXFTcVD1t0dCqufl7bXh3cGes7BZ6Ndf3CKZQVJ/npJSdwz6Wzm7b/ywWpKdRzZ1R0V2ixQEfoIpIsmzUG3my9vVdJkl5pF7LXfPN0DtWm6sfMOXIoc45snn+9aM44Hl2/jaXTW0/RzT5iMLwFc48cytyjxgIpsb556TTOq6ps1X75olQtnOvOmMydnzme6pp67l39Lh9vo1BZT+XN7y6CotTZzPzJLYuOLZ1ewdLpKTEvK0ly7zPv0rdU8tRZot9jYaTfBXqrfGb8wT5aIVQ/gbjISFjsYP+3d2peObh3izopzaTt7zT7iYTx2f81rk1bpUVJvnt2cz2WXiVJLsuY023ZNz51TiDj35Fr/3ZsJ/1T52brf581jS+ceiSDGiuANsZQ4GcfhUDPOD8XIvYUbtpiv04eaRcnE1SGXLQuLkRc0MMYxCF9USyEollh+QmhOFfwuyUgB0H1TQGnLfYry37RWPhLxAVdCCFEIxJ0IYSICRL0XNEFmR5C1Pdz1OMX+RADQQ8jKyS84lzhFM0Ky0+QhaMa90kAPlrMH/toP5C+CaqPfejfxs9r+WbMuIz/IhsxEHQhRCFnuYjwiLagh5UVEgohZIaE5iesTJoo2g+obwo4y0U/NuGRk6Cb2UIzW29mG8xseRuvX2tmr5nZy2b2iJmN9T9UIYQQ7dGhoJtZklTdwUXAVGCZmU3NaPYCUOWcOxb4b+A2vwMVQgjRPrkcoc8CNjjnNjrnaoD7gKXpDZxzjznnDnqrTwOj/Q2zENAFmR5B1LOZoh6/yItcBL0CeC9tfbO3LRuXAb9v6wUzu9zM1pjZmu3bt+cepRBCiA7x9aKomX0GqAJub+t159yPnHNVzrmq8vJyf5yqOFfh+gmscBT4UzwqG20X58qfIItz+TlG/S3O1RxaF23prCNncqmaswVIrxc62tvWAjM7HbgBONU5d9if8DpCtVwK0k8sso9Uy8VHYz7aEu2RyxH6s8BEMxtvZiXABcCK9AZmNgP4IXCWc26b/2EKIYToiA4F3TlXB1wFrALWAfc759aa2c1mdpbX7HagL/BfZvaima3IYk4IIURA5FSo2Dm3EliZse3GtOXTfY5LCCFEJ4n2naJhogszPYSo7+eoxy/yIQaCHlZWSFBuwiqa1YHfUJz6YTKsbJ0A7LusK34Z9cmkT0XEfBvbQWdOxYcYCHoY6Cp94RHgPgmtfo+f+B2zj/Yi2Z/RJNqCHov0uCZHqDhXZ1xE1b6Kc4ngiLagCyGEaEKCLoQQMUGCnjO6GNMjiPpFt6jHL/Ii+oIe1iPbAiNGtVxCydgJKSsoEPtB1nLxE7/i9GnMOWW55Er0BT0MdJW+AAlyn0RxfyvLRURe0FWcqyD9BFWAqpWPQB0EZFbFuURwRFzQhRBCNCJBF0KImCBBF0KImCBBzxVdXe8hRH0/Rz1+kQ8xEPS4FecKy09EC2e1euRalIpzhfBIPl9Mpj2Czo/iXI0XWP0ozqUfrHaJtqCHVsslLjVWwvIT8VouQWcCqZaLCIhoC7oQQogmJOhCCBETJOhCCBETJOhCCBEToi/ooRXn6q5iXUHZjWrhrIB9BJoNFEJxLl/sqjhXVIm+oIeBigsVICrOFSwqzhVFIi7oKs7VeT+oOFduDgIyq+JcIjgiLuhCCCEakaALIURMkKALIURMiIGgh5QVElY2TWh+QvARxmPufPcRYBZFi2Ktd48AAAhYSURBVNj98hNAf/gVp29jW7VcciUGgi56JrrQ1pIC7g9luYRGtAVdxbkK1I+Kc7VjXMW5us1G/Im2oAshhGhCgi6EEDEhJ0E3s4Vmtt7MNpjZ8jZeLzWzX3uvP2Nm4/wOVAghRPt0KOhmlgTuABYBU4FlZjY1o9llwC7n3ATgn4Bb/Q5UCCFE+xTl0GYWsME5txHAzO4DlgKvpbVZCtzkLf838G9mZs6FUEnn+XvgjVX+262tbrm+bR3cMdt/Pw11Ldf3bQ3Gz+H9LdfX/Q9sWeOvj8zd/dEm/z9L9S5vwbtI9sh34K//6p/9vVuhtG/z+o/PgEQuX5Mc+GgjDJ2YWt7yvD990zROvf747ZVQ0ic/m9UfNdv7022w+u6u2amv8ULzbN29ABLJzttxDWCJ1Pi6c248smZOvR6O/qTvZnMZqRXAe2nrm4HMkdjUxjlXZ2Z7gCHAjvRGZnY5cDnAmDFjuhhyGqd8FT54JX872Rg7B0bPAgyKy4LzM3I6TDgNhh0FNfsJLNe2bEHKx9yr4Z0ng/ExfBpMWQwVM1JfxCDoNxJGzYBZl8P+D/21XT4ZxsyBCafD0Z+Chlp/bc/4Wzi8D3oP9s/u2LlQdQkc3JGynS/lU2D2FTBgNOzdkp+tiiqYvgxe+GXX+3LYVJi0EN5c1foAKKqUDQzErHV0EG1mnwIWOuc+563/LTDbOXdVWptXvTabvfW3vDY72rIJUFVV5das8fkIUQghYo6ZPeecq2rrtVwuim4BKtPWR3vb2mxjZkXAAGBn50MVQgjRVXIR9GeBiWY23sxKgAuAFRltVgAXecufAh4NZf5cCCFEEx3OoXtz4lcBq4Ak8BPn3FozuxlY45xbAfwY+IWZbQA+IiX6QgghQiSny/fOuZXAyoxtN6YtHwL+xt/QhBBCdAbdKSqEEDFBgi6EEDFBgi6EEDFBgi6EEDGhwxuLAnNsth14p4tvH0rGXagFguLqHIUYVyHGBIqrs8Q5rrHOufK2Xug2Qc8HM1uT7U6p7kRxdY5CjKsQYwLF1Vl6alyachFCiJggQRdCiJgQVUH/UXcHkAXF1TkKMa5CjAkUV2fpkXFFcg5dCCFEa6J6hC6EECIDCboQQsSEyAl6Rw+s9tlXpZk9ZmavmdlaM7vG236TmW0xsxe9v8Vp7/m6F9t6MzszqLjN7G0ze8Xzv8bbNtjM/mBmb3r/B3nbzcx+4Pl+2cxmptm5yGv/ppldlM1fjjFNTuuTF81sr5l9uTv6y8x+YmbbvIevNG7zrX/M7Hiv/zd4783puWhZ4rrdzF73fD9gZgO97ePMrDqt3+7qyH+2z9iFmHzbZ5Yqvf2Mt/3XlirD3dW++nVaTG+b2Yth9pX3vmy60O3jC+dcZP5Ile99CzgCKAFeAqYG6G8kMNNb7ge8QepB2TcBX22j/VQvplJgvBdrMoi4gbeBoRnbbgOWe8vLgVu95cXA70k9KPJE4Blv+2Bgo/d/kLc8yMd99QEwtjv6CzgFmAm8GkT/AKu9tua9d1EecX0MKPKWb02La1x6uww7bfrP9hm7EJNv+wy4H7jAW74L+EJX+yrj9e8DN4bZV17bbLrQ7eMrakfoTQ+sds7VAI0PrA4E59xW59zz3vI+YB2p56dmYylwn3PusHNuE7DBizmsuJcCP/eWfw6cnbb9HpfiaWCgmY0EzgT+4Jz7yDm3C/gDsNCnWE4D3nLOtXc3cGD95Zx7glRt/kx/efeP91p/59zTLvXtuyfNVqfjcs497JxrfFjm06SeCpaVDvxn+4ydiqkdOrXPvCPLBaQeHp9zTB3F5dk9D/hVezb87isvrmy60O3jK2qC3tYDq9sTWN8ws3HADOAZb9NV3unTT9JO1bLFF0TcDnjYzJ6z1MO3AYY757Z6yx8Aw7shrkYuoOWXrbv7C/zrnwpv2e/4AC4ldUTWyHgze8HM/mRmJ6fFm81/ts/YFfzYZ0OA3Wk/WH711cnAh865N9O2hd5XGbrQ7eMraoLeLZhZX+D/Al92zu0F7gSOBKYDW0md+oXNSc65mcAi4Itmdkr6i94ve7fkpHpzpGcB/+VtKoT+akF39k82zOwGoA74pbdpKzDGOTcDuBa418z652ovz89YcPssg2W0PGAIva/a0IW87PlB1AQ9lwdW+4qZFZPaab90zv0GwDn3oXOu3jnXANxN6nSzvfh8j9s5t8X7vw14wIvhQ+90rfFUc1vYcXksAp53zn3oxdjt/eXhV/9soeW0SN7xmdnFwMeBCz0xwJvW2OktP0dqjnpSB/6zfcZO4eM+20lqiqEoY3uX8WydC/w6Ld5Q+6otXWjHXnjjK5eJ9kL5I/XIvI2kLsY0XniZFqA/IzV/9c8Z20emLX+F1JwiwDRaXjDaSOpika9xA32AfmnLfyU19307LS/K3OYtL6HlRZnVrvmizCZSF2QGecuDfei3+4BLuru/yLhQ5mf/0Pqi1eI84loIvAaUZ7QrB5Le8hGkvtTt+s/2GbsQk2/7jNSZWvpF0Su72ldp/fWnbuyrbLrQ7eMrECEM8o/UFeM3SP0C3xCwr5NInTa9DLzo/S0GfgG84m1fkTH4b/BiW0/alWk/4/YG7Eve39pGe6TmKx8B3gT+mDY4DLjD8/0KUJVm61JSF7Y2kCbCecTWh9RR2YC0baH3F6nT8a1ALak5yMv87B+gCnjVe8+/4d113cW4NpCaS20cY3d5bT/p7d8XgeeBT3TkP9tn7EJMvu0zb7yu9j7nfwGlXe0rb/vPgL/LaBtKX3WgC90+vnTrvxBCxISozaELIYTIggRdCCFiggRdCCFiggRdCCFiggRdCCFiggRdCCFiggRdCCFiwv8HiYCUqofY0rkAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.539]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 8, training avg cost 0.538862\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZgdZZX/P+f2ml6ydmftTjohCSEEEpImKJEAYkwCI0FAhFFHcUFHEZRRBseR4UGdUfip83NkFHwGt1EB94zEHyIiOChLQkIgG1lJOgnZyZ5e398ft7r79k3f7tvdVdW3qr+f5+mna3nvOee+VfdbVW+dOmXOOYQQQkSfRH8HIIQQwh8k6EIIERMk6EIIERMk6EIIERMk6EIIERPy+8txRUWFq6mp6S/3QggRSVasWLHfOVfZ2bp+E/SamhqWL1/eX+6FECKSmNlrmdZpyEUIIWKCBF0IIWKCBF0IIWKCBF0IIWKCBF0IIWJCt4JuZg+a2V4zeyXDejOzb5rZJjNbbWaz/Q9TCCFEd2Rzhv59YFEX6xcDU7y/m4Bv9z0sIYQQPaXbPHTn3NNmVtNFkyXAD12yDu+zZjbUzMY453b7FGNHXvsrbP5jcjqvAObcCGWd5tj3nRU/gMN1yelpV8DYWcH4eeWXsHddcvqMS2HChf77aGmB574DJw9BIg9mvQeGVvvvZ/0y2LUyOT11IVTV+mu/8WTyezScgMJSuOBjUFDsn/3DdbDyv6GlGUaeBTOu9s923Qp49f8lp8++Ckad7Y/d5d+DI7ugtALm3gRmfbO39c+w9WkoLIG5H03+7y2rH4H9G2H0DJi+pHc2Xv091L2Q/P1Nu6L3sQwA/HiwaBywI2W+zlt2mqCb2U0kz+IZP35877zVPQ9P3wt4ddxLRsD5H+qdra44+Qb8zy3t8/vWw7t/5L8fgKWfhIZjyenXnoEbl/nvY/8GeOxz7fOWgItv99/Pss/AkZ3J6V0r4b0/99f+9r/CH+5qnx83ByZe5J/9VT+BP/1bcrqw3F9Bf/qedkE/XAfv9OFi9vgB+O2n2uenLoRhNX2z+Yd/gZ0rktOjzoEpb+u9rV99FFwLDBrWe0H/3e1waCuUj5Wgd0OoN0Wdcw8452qdc7WVlb08q553K9z1Bnx2s2e0xb8AU2m1u/geGDUjOD+QPBu88BaYeHFyOigfANf9qON8EH5mvx+qzgcXgI8Wbzss+GLyv98+Wvvlwk8GY3vs7KTg+mW71U71Be0++kpLM5SO7Gi/t7T+blr68PtpjSGI/Slm+CHoO4HUa/cqb5kQQogQ8UPQlwJ/52W7vAk4HNj4uRBCiIx0O4ZuZj8FLgEqzKwO+BegAMA59x1gGXA5sAk4AdwYVLBCCCEyk02Wyw3drHfAJ3yLSAghRK+I/pOizoVjNyg/SeMZpoPyEZKfQPos6O0SZPwB2A66j/ti3/m0X7d+NNDfYDyIvqCLgUlfc61jh8/9of6NJBEW9LB2OAvHl1k4P6LQfATtJ6L2g+ob37erBD2KRFjQhRBCpCJBF0KImCBBF0KImBADQQ8rKyRAXNCZIZ3YDcVPAD5Oizus7CM/zAXRN0H3sU9ZLn3a31zaf5GJGAh6WGhnGhBENjXOu4kZ2fiFH0RX0MNKqwotfSukbJpQ/ISQsRNZ+wH1jd82lbYYSaIr6GGifTsHCXKjaIOrD6KJBF0IIWKCBF0IIWKCBF0IIWJC9AU96ml+SeMZpoPyEZIfFecK3nYgaZx+pxumT/fUjPMhloFB9AVdDEyUhZGGslyEBD1LYlacK5SUv4gWz4Jg41dxLhEgEnQhhIgJEnQhhIgJEnQhhIgJMRB0FefqsY/Q/Kg4V0dzUSnO5ZN9FecKnRgIelhoZxoQRD01Lurxiz4RXUFXca4c9qPiXF0YVnEuERjRFfQw0c6dg6g4V7CoD6KIBF0IIWKCBF0IIWKCBF0IIWJC9AU9rDS/QFFxrl7Z72TWX1dRKs7l55i3inNFlQgLepjZJwS/M6mWSy98BEEQAplC4LVcfNpPdU80kkRY0MNEe/eAQllNaJ+PJhJ0IYSICRJ0IYSICVkJupktMrMNZrbJzO7oZP14M3vSzFaa2Wozu9z/UIUQQnRFt4JuZnnAfcBiYDpwg5lNT2v2z8AjzrnzgOuB//Q70MyoOFePfYTpJ3D7Ks7lO37FqeJcoZPNGfpcYJNzbotzrgF4CFiS1sYBg73pIcAu/0LMFbQzDQgimxoXUjaWyGmyEfRxwI6U+TpvWSp3Ae81szpgGfDJzgyZ2U1mttzMlu/bt68X4XYw1rfP55ofFefqoYuo2ldxLhEcft0UvQH4vnOuCrgc+JGZnWbbOfeAc67WOVdbWVnpk+sQ0M6dg6g4V7CoD6JINoK+E6hOma/ylqXyIeARAOfcX4FioMKPAIUQQmRHNoL+AjDFzCaaWSHJm55L09psBy4DMLOzSAp6H8dUhBBC9IRuBd051wTcDDwGrCOZzbLGzO42syu9Zv8AfMTMXgJ+CnzAuZDuzkQ1W6OjswzTQfkIyU8otVwC3E6RquXiJ6rlElXys2nknFtG8mZn6rI7U6bXAvP8DU0IkT0a8xaRflJUxbly1o+Kc2UmMsW5dICIIhEW9DDRzp17BLhNJGZon48mEnQhhIgJEnQhhIgJEnQhhIgJMRB0FefKzkcXPn31E3AKpopzpRv1/vs45q3iXJEluoIeei2XoHcm1XLpmYuA7LcKT1Rrufh1oNaN4UgSXUEXQgjRAQl6NuhsJQdRca5gUR9EEQm6EELEBAm6EELEhOgLuopz9cJHSH5UnCt42wOqOFcfQhkgRF/QhRBozFtApAVdxbly1k9Y3yMQVJyroz1fjPloS3RFhAVdCCFEKhL0rNAZRu6haovB4mMfqD9DQ4IuhBAxIQaCHoNaLh3chpS1E0rNGNVy6WguyFouPqJaLpElBoIuhNCwoIAoC7qKc+Wwnwhn0qg4V0d7/hjz0ZboiugKuhBCiA5I0LNBd+lzEBXnChZluUQRCboQQsQECboQQsSE6At61ItzhZaGp+Jc/U8AfdNmx89hjVwtzhWlbd0/RFjQQ67lEribsGq5EFIGSkSzXNodBGQ26FouuWhPY+hhEWFBDxmdHQwQor6dox6/6AsSdCGEiAkS9KzQJWPuoeJcwaK0xSgiQRdCiJiQlaCb2SIz22Bmm8zsjgxtrjOztWa2xsx+4m+YXRHx4lynFc0Ky08Y2UFRL87Vmb8csdVuNACTKs4VVfK7a2BmecB9wAKgDnjBzJY659amtJkCfA6Y55w7ZGYjgwpYCCFE52Rzhj4X2OSc2+KcawAeApaktfkIcJ9z7hCAc26vv2F2QujFuQJ3hIpz9cRFhO0HWZwrJ+1pDD0sshH0ccCOlPk6b1kqU4GpZvaMmT1rZos6M2RmN5nZcjNbvm/fvt5F3G/ocm9AEPX01KjHL/qEXzdF84EpwCXADcB3zWxoeiPn3APOuVrnXG1lZaVPrkNAd+lzEBXnChZluUSRbAR9J1CdMl/lLUulDljqnGt0zm0FXiUp8EIIIUIiG0F/AZhiZhPNrBC4Hlia1ubXJM/OMbMKkkMwW3yMMzNRr+XSHzVWAiVmtVx8tR9kLRc/US2XqNKtoDvnmoCbgceAdcAjzrk1Zna3mV3pNXsMOGBma4Engc865w4EFbQQIh0Na4gs0hYBnHPLgGVpy+5MmXbAbd5fSKg4V876UXGuLsxGpThXzhoTXaAnRbNFl3siEmg/HchI0IUQIiZI0LNCl4y5R8Qe/IkcSluMIhJ0IYSICTEQ9Iin+fVL0ayw/MSgOJef9gPtGx/PgoNIqVRxrlCIrqDHspZLWH5CyqaR/c4MB9P9quUiiLKgCyGE6IAEPWt0uTcgiHp6atTjF31Cgp4Nuks/wND2VpZLNJGgCyFETIi+oKs4V+76UXGudGP+21VxLpFC9AVdCIGGiQREWtBjlk5oqDhXj30E6iAgs1EpzqW0xSiSVbVFgS73BgxR385Rj99fnly/l0eW7+i+YcjcMHc886f6/9Y2CboQIrY8/MIO/rh+LzUVJf0dSgcOn2wMxK4EPSt0yZh7qDhXsMQnbXFiRSmPfXp+v8YQFhEeQ29FtVxy149quXQ0FUTfBN3HPmSn9BnVcsmWGAi6EEJXkQKiLOixLM4Vhq8w/ISVSROk/QANBxG7slw6xQ2ws/roCroQQogOSNCFSCXq6alRjz8ABtI9bgm6EELEBAl6NgykQ3xkCHKbaHv7m7bonynRNdEX9MCyFlWcq89+wuhDFefyJvxUzSCKc/XBVlvWYs8/P9BGoCIs6HGr5RJCZkibnxB8qJZLBrOq5SKCQ0+KdkN9czON9U3k5+WRaGqhMP/0Y6BzjtV1h5kxbgh/3riPkeXFTKosZe+RegrzE4waXERzi8PMSBhs2X+cjXuOUlsznEHWRGmI3+eVnYeZ3NjCXzfsZdzsowwdVMCa3UeYXFnGGycaWbf7CGbw7JaDVJQX8sS6vcwYO5grZ43lB395jUUzRnPsVBPHG5qYNnowa3cd5pt/3MTYIcWcamphhcGxU03UvX6EYZzi2N6jVJYVs+PQCYryE9Q3tbB21xESCWN13RsMKsjj2S0HOGvMYC6cXMHja/cwf0oFb5xopLGlhVlVQ/nxc9t5Yv0eyooK+NIZu1kUYn8FgSOzxDU1t5AwI5EwnEvuM53acI4WB3ne/OGTjQzJYHPnGycZVJBHWVE+m/Yeo6aihP1HGxg/ov1x+GP1TWzdd5xzqpJWXj98itHAqaZmmuqb2H7gBIMH5bP9wAn+sG4v8yaPYMVrh5hZPZQ1u47w5kkjONnYxPjhJew4eJJHlu/gs/NHMqk3HSR6jQQ9A0dONTIYuP+pLVxw6DAO4/p//p3vfgpp5NVi381m5MvL1vGdgma27T/Ojd94OqvPbNp7jF+v2gXAU6/u67TNrsOnAKgvaubHz23nTDtFvZ3iqq9n5+OlusM89EKyiNL/vLSr0zanGutZ9vLrLCrMymQvCfYafeWON8g7dZh9zvjQHY/22d4ZtpMniuCJ9Xu5Og+u/Nb/strt7pPNpwpPcIB8Rifgjl+8zK9/dvqh4sFntnaY/+YTG09r88wrm1gd4r4tIj3kEiwvbDsEwJZ9x0Pxt9sTRBFfGptbOBpQUaZcpqllgA1k9yMS9AwU5LV3jQthDHDbgWAPHHWHTgRqP3yCLM4VjNl1u48GYzgA/NznTzS2+Garpwy0Q0nkhlxONjSzZtdhnt6wm9uAr/1+Pf/xu0f5+0vO4Nt/2uybnxrbzZ+KfDOXEfN2uYPHGziZ30xBcwst3lh96xhqfVMzpxpbKC3M43hDM2VF+RhwrKGJ/UfrqRlRyv7j9QwZVMC2/ScYVlrA8JJCjtc3A/Ds1gP88rdruT/4r9NBBoLRxWR/7T1az0hg1+GTDDreQFFBguP1zRQXJNhz5BSDBxVwvL6Z/ISRn2es2v4GcyYMY/O+40yqLOXF1w4xcnAx4BhaUsj2gyd4asM+7sx3JIAdh05SDew+fILBZQkOn2ykrDifuoMnGVFWyJ4jp3AOhpcWsnLHG8weP5S1u44wrLSQ4vw8GppbOF7fxBkjy/jdy8khkDkt7cJmOSw1qbH1Jc7W7d/c4pIzzoEZpxqbaXEO55Jj9xVlReQlkq1PNDRxsqG57T5CwqDMO7w0tbQw+Y5HOWvMYNbtPtLmZ1JFKSPKCnlh2yHKi/M5eqqpQxzTRpf3+jtEjcgJ+vu/9zzPbz1IHs3cljI+56eY9wdrdh6B3XuZasdZEMBY/ZsSjRDo2HO4fOE3r3B/IXzpt2tZtnSwb3Yn5G/jXQXN/PT57dxeABff+ycaKPDF9q8C7P/cPTy0M/GfHsX1YlBgdVEjg807MEAHMYdkksGW/ckr3HQxB1j/enSujPpKVr1rZovMbIOZbTKzO7pod42ZOTOr9S/Ejiw6ezQQzjBI0k84OCyU7xSGH0cY2yc4+80tLYHZD67//bXpZ4xRONjEhW4F3czygPuAxcB04AYzm95Ju3LgVuA5v4NMZVBhXveNhBBiAJLNGfpcYJNzbotzrgF4CFjSSbsvAl8FAk3XSL1ZGSa5POYpRCvaTwc22ajjOCD1Lat13rI2zGw2UO2c6zKx1sxuMrPlZrZ8377O85m7oyBPT50JIURn9Pl018wSwNeBf+iurXPuAedcrXOutrKyd2+8Tj9DD0reU+06F95BJKgzrHS7wZ3J+ZMhkYlWm0FUMGkldfzYz+/gV/ZIJpt+YaT2b1+yXFq3lbXZ7W08fY1loJCNoO8EqlPmq7xlrZQDM4A/mdk24E3A0qBujPbXkIsQuUxYSQIit8lGHV8AppjZRDMrBK4HlraudM4dds5VOOdqnHM1wLPAlc655UEE3DrkEmb2STh+IIwiRs6FkeUSjo8o2ncBnWf6XVXQ3ywXHWzColtBd841ATcDjwHrgEecc2vM7G4zuzLoANMp1Bm6EEJ0SlYPFjnnlgHL0pbdmaHtJX0PKzMFnVQ7FMIvoj5Oq3PhgU3k1LH1EWEhhBAdiZyg94ecR/ucLZ4EOS6rMV89KRpVoifoaQX/w0rzC4rw0gm79huE3aBS6jL5y31S+8YfAikiYP7E6de+baGlDESf6Al6fwcgRA6iqwoBERT01mpq4e3AYaUthnOu6Qj+EjgsH1G0H9S5Zm4Lei7HFi8iJ+jNA+013kIIkSWRE/T+SnKJ1lht7zm/ZhgAX3vXzMB8lBcls2Xfc8H4wHz0lqhv5/T4hwzyp5Z7K3++/VLKiyP3GoUBQ+QEPddO0P/v9bP4yYcvYPO/Xs5HLpqY1WcWnj0q4Kgy092P8Wcfu5BtX7mCa+ZUccmZyXo7s6qHcsPc8Vw9u70mW37CuGZ2FU9+5pK2ZVXDBgFw4RkVXfpYeecCXv3SYr64ZEZbPLPHD+Wm+ZOoKGt/TdRZYwZz7ZwqvnjVjB59x6gyvLTQ97frPHrLW9j2lSt8s1c9vIS/3PHWjOsL8xJMHlnmmz/RMyJ3qG1JU/TUO/J+0nqmc9V5Y5mwq4wia2JmYggLpo9i+8ET5CWMySPLece5Y0l4lw2fW3wWS2aNY/m2g1x3fjUtDlbveIPSonzy84yxQwaRSBjlRfn8069eZtfhU/zrFZPg24F8hU6/zz3Xnsuw3xbynunjufbtC8nPM041tFBSdHqd+e/fOJf6pmbyEwnyEkZLi+Pfrj6HovyObdfevZDGZkd5UT72JeO86qE0FYwkcayZle9bQFFBgpMNzZR6Z+b5KU/7vnzXQk40NFGcn0ciYXzm7WficB18OOe44pwx5Jlx15f+nFzWVvDJ/+2fatHf4lxd233xCwsA2LjnKAu+8TQAH5w3kfWvH+HwyUYumlLJlJFllBbl8bH/fhGA98ythpdg3uQRsA0qygp54Ko5fPupzcw7o4KqYSUArLpzAT9fUUdJYT5vPmMET23Yy0VTKykrSr6yLS9hfOqhlfzr1ecw+Af5jKwcDnXwj4umcvcFC9m09xg1FaUAlBcXsO0rV7D3yCkqyoo4eqqJwvxE27sK6pua2X7gBOOLT8A3+r6t2rNccuxsLgeJvKAHTX4iwbihxdDcyG8++JYu2yYSxoxxQ5gxbkjbsgsnd362+pVrzk1ONAT7cuh0EmYYyTPsfE9g0wU6ldR1iYRRlDi9bUnh6btRvneQG1ZamLFNZ58v7ORJYDNjeGl83p83q3ooFA9l2/s6P3OeMqq827PqtvV71sJLJIV7G3z3786H6tG83XuzVytDSwr58EWT2uYnVrRfTY7y3uD3m5u9/bu4ALyD7sjyYijKZ2b10NNiSL6TFYaUdBzWKcrPY8qocjhe3+V3EP4TuSGX9vfshjSYHpKbcF7bBmBgAfsxI+iOi3JxrkD6xu9t6qM9nVeHR/QEPdcG0YUQIkeIoKD3dwRCCJGbRE7QXdhj6K038HRlkBMUB/yS8Og/AqP9dCATPUFPmw/6Bzh9zOBgvaRn7YRVmyaoA1QHu/77mDaqrIPlqL6Czr++CWA7+rUNnV5BFzaRE/S3nTUqqwdSnv7spb74i/4ZW0fKB0UusakDrQ8liXSitafOmTCsv0OIJZET9ML8BF9+5zn8+7tnZWwzfcxgxo8oYf0XF/HbT76Fxz89H4ApI8u4e8nZfPySMwC4bcFUzhk3hPe/eQL3/e1snvrsJW02PjivBoDSYn+ftMvE+RNHdJlZ8dH5kzKu6wn5iTyC//EHl0nzNzPHtvvIwD2tKaF9oKtzwdo+iFFxQV4wfZPTWS6n25pZdXoaZE+YNrqcr71rJnNrhrcte/ELCxhWcvrvdWbVkNOWxZXInu5cdd44+A3cetkUbr2085zd4oK8tpzw9Lze2xdNA+CWy6Z0WN7Wbt8GWN6eTx0000aX8+Jrmdd/+KJJVJYX8aVH13HzpZP51pObOqyfM2EYF0wczn/+aTMAN82fxJJZY7nim/8LwDtnjYO1gYUfGtm8gvDq2eO4/RerAXjwA7Xc+tCqtqJunfHWaSP54/q9LJ4xmvNPDaN8Xz4c67ztzz72ZvYeqWf5a4c4t2oIR042su3AiYy2504czvNbD/KFv5lO5V+KmDJyGDQd7vY7xJ3OhBfgq9ecwz/+4uVO1yUSxsovLOBEYzPjhiafSr5mThWvHTjO2KGDKMhLsPLOtwOw72g92w+eYOzQYt/LH+QykRX0uFE1rIT6sYOZ0FDCtluv4Bcr6vjFi3Vcfs4Y3vumCQDcOG8iIwcX845zx/CZhWcCsOPgCQ6daOBc74xnZvVQbnt4FbctmEpxQV77AWrzk7EQ9HT+z3Uz2fBkORv2HAXg3mvPJT8vwW0LpvL1x1/l0jNH8vJdC6m549EOnyvKT1DflHyo4ZxxQ3jwA+cnVzw6GA5mPmicXzOcdbuPAHBdbTXvfdMEPvLD5Ty+dk9bm4umVLBqxxscPdXEwrNH88hH35xcsaYYCvIg87FlwLBoxmi+9virLLvlIs4cXc4Z/5R8w+W7zx/P957ZxvrXj7LslouYPnYwfDkPGpPXZMNKC0m/PpowovQ0+5XlRVSWF522PO5I0LMm2BsyBkyuLINdySuCa+ZUcc2cqg5t8hLGlW1DDkmqh5dQPbykbX7h2aNZc/eiQGPNBRaePRo2Js/YH/v0fA6fbORbf9zIklnJejO3XDalw9XX+OElbD94giWzxvKbVbv4wLwacHD/01s6Zk5505dNGwWbk091/nXHKb5+3UzKvPH7s8YM5vnPX0alV3dm2uhyHl+7h6mjynh1zzGGlRTyrjnVPPjM1tCzsnI5G2vKyFJm1oziutqqrJ6GFT1Hgi4iyQ1zx8PG9vkhgwr4/BXTM7ZffM5o7n9qC7PHD+M3q3ZRPayEWdVDuf/pLbxt+unF0i6YNBw2wwPvm8PrJ71H2VMYWV7cNv2JSydTXJDHlTPHctE9T/K3F4xnwogSVu04xNWzq9JND1iW3TIf8rsv4fC5y8/itodXMbHi9DNv0TXRF/Qw0u8CfVQ+Pf5gX60Qqp9AXKQlLGa5/f9x4TT+/uIzGDKogJqKUuZPqcDMMpwltm/v8qJ8ygd3XT2wuCCPT1w6Geh4r+aXH5+XIfbs4+6WQPZ/R0/7N7Od1G+dna2Lp1aywitU1iGGHL76yBWiL+hCZEEiYQwtSZ4dXjy1sp+jCYLcTVssL8qHhq7bfOPdM8lLRC7pLueIuKCHVMwqDCyEollh+QmhOFfwmyUgB0H1TQ6nLZYXF3Yr6O88T0NTfqBDohBCxAQJuhBCxAQJerbohswAIerbOerxi74QA0EPIyskvOJc4RTNCstPkIWjWrdJAD46jB/7aD+Qvgmqj33o39bva33NmHFp/0UmYiDoQohcznIR4RFtQQ8rKyQUQsgMCc1PWJk0UbQfUN/kcJaLDjbhkZWgm9kiM9tgZpvM7I5O1t9mZmvNbLWZPWFmE/wPVQghRFd0K+hmlgfcBywGpgM3mFn6M9YrgVrn3LnAz4F7/A5UCCFE12Rzhj4X2OSc2+KcawAeApakNnDOPemca60h+iwQw6cEdENmQBD1bKaoxy/6RDaCPg7YkTJf5y3LxIeA33W2wsxuMrPlZrZ837592UcphBCiW3y9KWpm7wVqgXs7W++ce8A5V+ucq62s9Kmehopz5a6fwApHgT/FozKRsr19tR9kcS4/91F/i3O1h9ZLW7rqyJpsarnsBKpT5qu8ZR0ws7cBnwcuds7V+xNed6iWS076iUX2kWq5+GjMR1uiK7I5Q38BmGJmE82sELgeWJrawMzOA+4HrnTO7fU/TCGEEN3RraA755qAm4HHgHXAI865NWZ2t5ld6TW7FygDfmZmq8xsaQZzQgghAiKr8rnOuWXAsrRld6ZMv83nuIQQQvSQaD8pGia6MTNAiPp2jnr8oi/EQNDDygoJyk1YRbO68RuKUz9MhpWtE4B9l3HGL6M+mfSpiJhv+3bQmVPxIQaCHga6S597BLhNQqvf4yd+x+yjvUj2ZzSJtqDHIj2uzREqztUTF1G1r+JcIjiiLehCCCHakKALIURMkKBnjW7GDAiiftMt6vGLPhF9QQ/rlW2BEaNaLqFk7ISUFRSI/SBrufiJX3H6tM85ZblkS/QFPQx0lz4HCXKbRHF7K8tFRF7QVZwrJ/0EVYDqNB+BOgjIrIpzieCIuKALIYRoRYIuhBAxQYIuhBAxQYKeLbq7PkCI+naOevyiL8RA0ONWnCssPxEtnHXaK9eiVJwrhFfy+WIy5RV0fhTnar3B6kdxLh2wuiTagh5aLZe41FgJy0/Ea7kEnQmkWi4iIKIt6EIIIdqQoAshREyQoAshREyQoAshREyIvqCHVpyrv4p1BWU3qoWzAvYRaDZQCMW5fLGr4lxRJfqCHgYqLpSDqDhXsKg4VxSJuKCrOFfP/aDiXNk5CMisinOJ4Ii4oAshhGhFgi6EEDFBgi6EEDEhBoIeUlZIWNk0ofkJwUcYr7nz3UeAWRQdYvfLTwD94Vecvu3bquWSLTEQdDEw0Y22juRwfyjLJTSiLegqzpWjflScqwvjKs7VbzbiT7QFXQghRBsSdCGEiAlZCbqZLXgslpkAAAgUSURBVDKzDWa2yczu6GR9kZk97K1/zsxq/A5UCCFE13Qr6GaWB9wHLAamAzeY2fS0Zh8CDjnnJgPfAL7qd6BCCCG6Jj+LNnOBTc65LQBm9hCwBFib0mYJcJc3/XPgW2ZmzoVQSefFH8Krj/lvt/Fkx/m96+C+C/z309LUcf7o7mD81B/rOL/uf2Dncn99pG/ug1v9/y4nD3kT3k2yJ74If/kP/+wf2Q1FZe3z/7UAEtn8TLLg4BaomJKc3vmiP33Ttp96/fHrj0Nhad9snjzYbu+pe+D57/bOTnODF5pn67tvhURez+24FrBEcv/69rx4ZM1cfDvMuMZ3s9nsqeOAHSnzdUD6ntjWxjnXZGaHgRHA/tRGZnYTcBPA+PHjexlyCvM/A6+/3Hc7mZhwIVTNBQwKioPzM2YWTL4MRp4FDccILNe2+K1JH/NugdeeCcbHqLNh2uUw7rzkDzEIysfA2PNg7k1wbI+/tivPhPEXwuS3wYxroaXRX9vnvQ/qj0LJcP/sTpgHtTfCif1J232lchpc8FEYUgVHdvbN1rhamHUDrPxx7/ty5HSYugg2Pnb6CVBUKR4aiFnr7iTazK4FFjnnPuzNvw+4wDl3c0qbV7w2dd78Zq/N/s5sAtTW1rrly30+QxRCiJhjZiucc7WdrcvmpuhOoDplvspb1mkbM8sHhgAHeh6qEEKI3pKNoL8ATDGziWZWCFwPLE1rsxR4vzd9LfDHUMbPhRBCtNHtGLo3Jn4z8BiQBzzonFtjZncDy51zS4H/An5kZpuAgyRFXwghRIhkdfveObcMWJa27M6U6VPAu/wNTQghRE/Qk6JCCBETJOhCCBETJOhCCBETJOhCCBETun2wKDDHZvuA13r58QrSnkLNERRXz8jFuHIxJlBcPSXOcU1wzlV2tqLfBL0vmNnyTE9K9SeKq2fkYly5GBMorp4yUOPSkIsQQsQECboQQsSEqAr6A/0dQAYUV8/IxbhyMSZQXD1lQMYVyTF0IYQQpxPVM3QhhBBpSNCFECImRE7Qu3thtc++qs3sSTNba2ZrzOxWb/ldZrbTzFZ5f5enfOZzXmwbzGxhUHGb2TYze9nzv9xbNtzMHjezjd7/Yd5yM7Nver5Xm9nsFDvv99pvNLP3Z/KXZUxnpvTJKjM7Ymaf6o/+MrMHzWyv9/KV1mW+9Y+ZzfH6f5P32azei5YhrnvNbL3n+1dmNtRbXmNmJ1P67Tvd+c/0HXsRk2/bzJKlt5/zlj9syTLcve2rh1Ni2mZmq8LsK+9zmXSh3/cvnHOR+SNZvnczMAkoBF4Cpgfobwww25suB14l+aLsu4DPdNJ+uhdTETDRizUviLiBbUBF2rJ7gDu86TuAr3rTlwO/I/miyDcBz3nLhwNbvP/DvOlhPm6r14EJ/dFfwHxgNvBKEP0DPO+1Ne+zi/sQ19uBfG/6qylx1aS2S7PTqf9M37EXMfm2zYBHgOu96e8Af9/bvkpb/zXgzjD7ymubSRf6ff+K2hl62wurnXMNQOsLqwPBObfbOfeiN30UWEfy/amZWAI85Jyrd85tBTZ5MYcV9xLgB970D4CrUpb/0CV5FhhqZmOAhcDjzrmDzrlDwOPAIp9iuQzY7Jzr6mngwPrLOfc0ydr86f763D/eusHOuWdd8tf3wxRbPY7LOfd751zryzKfJflWsIx04z/Td+xRTF3Qo23mnVm+leTL47OOqbu4PLvXAT/tyobffeXFlUkX+n3/ipqgd/bC6q4E1jfMrAY4D3jOW3Szd/n0YMqlWqb4gojbAb83sxWWfPk2wCjn3G5v+nVgVD/E1cr1dPyx9Xd/gX/9M86b9js+gA+SPCNrZaKZrTSzp8zsopR4M/nP9B17gx/bbATwRsoBy6++ugjY45zbmLIs9L5K04V+37+iJuj9gpmVAb8APuWcOwJ8GzgDmAXsJnnpFzZvcc7NBhYDnzCz+akrvSN7v+SkemOkVwI/8xblQn91oD/7JxNm9nmgCfixt2g3MN45dx5wG/ATMxucrb0+fsec22Zp3EDHE4bQ+6oTXeiTPT+ImqBn88JqXzGzApIb7cfOuV8COOf2OOeanXMtwHdJXm52FZ/vcTvndnr/9wK/8mLY412utV5q7g07Lo/FwIvOuT1ejP3eXx5+9c9OOg6L9Dk+M/sA8DfAezwxwBvWOOBNryA5Rj21G/+ZvmOP8HGbHSA5xJCftrzXeLauBh5OiTfUvupMF7qwF97+lc1Ae678kXxl3haSN2Nab7ycHaA/Izl+9e9py8ekTH+a5JgiwNl0vGG0heTNIl/jBkqB8pTpv5Ac+76Xjjdl7vGmr6DjTZnnXftNma0kb8gM86aH+9BvDwE39nd/kXajzM/+4fSbVpf3Ia5FwFqgMq1dJZDnTU8i+aPu0n+m79iLmHzbZiSv1FJvin68t32V0l9P9WNfZdKFft+/AhHCIP9I3jF+leQR+PMB+3oLycum1cAq7+9y4EfAy97ypWk7/+e92DaQcmfaz7i9HfYl729Nqz2S45VPABuBP6TsHAbc5/l+GahNsfVBkje2NpEiwn2IrZTkWdmQlGWh9xfJy/HdQCPJMcgP+dk/QC3wiveZb+E9dd3LuDaRHEtt3ce+47W9xtu+q4AXgXd05z/Td+xFTL5tM29/fd77nj8DinrbV97y7wMfS2sbSl91owv9vn/p0X8hhIgJURtDF0IIkQEJuhBCxAQJuhBCxAQJuhBCxAQJuhBCxAQJuhBCxAQJuhBCxIT/D7aXnA/ISeKPAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.539]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 9, training avg cost 0.538872\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO2deZgcV3W33zO7NNKMttG+jCRrsSTbkpBl2Y53G0s28QIEpA8StmAIcVgccAQkwpgHAuYLEBJ/GBMI4ICNIZgoHyI2NsY2BmHLkjdJ1i5ZkrXv+yx980fVjLp7pmd6Zqqqu2p+7/PMM1XVt885fev2r6tunTplzjmEEELEn5JCByCEECIYJOhCCJEQJOhCCJEQJOhCCJEQJOhCCJEQygrleMiQIa6+vr5Q7oUQIpa88MIL+51zde29VjBBr6+vZ8WKFYVyL4QQscTMtuV6TVMuQgiRECToQgiRECToQgiRECToQgiRECToQgiREDoVdDP7npntNbNXc7xuZvZNM9toZi+b2ezgwxRCCNEZ+Ryhfx+Y38HrC4BJ/t9twLd6HpYQQoiu0mkeunPuaTOr76DJzcAPnVeHd7mZDTCzEc65XQHFmMm2P8Cm33jLpeXwpvdBv3Zz7HvOCz+AIzu85ak3wsiZ4fh59eewd623PPEqGHdJ8D5SKfjjfXDqEJSUwsx3wYAxwft5bRm8scpbnnw9jJ4TrP3GU97naDgJFdVw0YehvCo4+0d2wKr/gFQzDD0XZrw1ONs7XoD1/+MtT78Fhk0Pxu6Kf4ejb0D1EJh7G5j1zN6WZ2DL01DRF+Z+yPvfXV5+GPZvgOEzYNrN3bOx/jHY8bz3/Zt6Y/dj6QUEcWPRKGB72voOf1sbQTez2/CO4hk7dmz3vO14Dp7+KuDXce87GC78QPdsdcSpw/DfHz27vu81eOcDwfsBWPo30HDcW972LLxvWfA+9q+DRz99dt1K4Io7g/ez7JNwdKe3/MYqePfPgrX/+h/g8bvOro96E4y/LDj7L/4YfvuP3nJF/2AF/el7zgr6kR1wawAnsycOwP//+Nn1ydfDwPqe2Xz8c7DzBW952Hkw6dru23rkQ+BS0Gdg9wX9V3fCoS3Qf6QEvRMivSjqnLvfOTfHOTenrq6bR9WXfgzuOgyf2uQbTQUXYDotdhfcA8NmhOcHvKPBSz4K46/wlsPyAfCOBzLXw/Az+z0w+kJwIfhI+fvhui94/4P20dIvl/xNOLZHzvYENyjbLXbGXHTWR09JNUP10Ez73aXle5PqwfenJYYwxlPCCELQdwLp5+6j/W1CCCEiJAhBXwr8hZ/tMg84Etr8uRBCiJx0OoduZg8CVwJDzGwH8DmgHMA5dx+wDLgB2AicBN4XVrBCCCFyk0+Wy6JOXnfAXwcWkRBCiG4R/ztFnYvGblh+POM5lsPyEZGfUPos7P0SZvwh2A67j3ti3wU0rlveGup3MBnEX9BF76SnudaJI+D+UP/GkhgLelQDzqLxZRbNlygyH2H7ian9sPom8P0qQY8jMRZ0IYQQ6UjQhRAiIUjQhRAiISRA0KPKCgkRF3ZmSDt2I/ETgo82cUeVfRSEuTD6Juw+DijLpUfjzWX9F7lIgKBHhQZTryC2qXH+RczYxi+CIL6CHlVaVWTpWxFl00TiJ4KMndjaD6lvgraptMVYEl9BjxKN7SIkzJ2iHa4+iCcSdCGESAgSdCGESAgSdCGESAjxF/S4p/l5xnMsh+UjIj8qzhW+7VDSOINON8xe7qoZF0AsvYP4C7ronSgLIwtluQgJep4krDhXJCl/MS2eBeHGr+JcIkQk6EIIkRAk6EIIkRAk6EIIkRASIOgqztVlH5H5UXGuTHNxKc4VkH0V54qcBAh6VGgw9QrinhoX9/hFj4ivoKs4VxH7UXGuDgyrOJcIjfgKepRocBchKs4VLuqDOCJBF0KIhCBBF0KIhCBBF0KIhBB/QY8qzS9UVJyrW/bbWQ3WVZyKcwU5563iXHElxoIeZfYJ4Q8m1XLpho8wCEMg0wi9lktA41TXRGNJjAU9SjS6exXKakJjPp5I0IUQIiFI0IUQIiHkJehmNt/M1pnZRjNb3M7rY83sSTNbZWYvm9kNwYcqhBCiIzoVdDMrBe4FFgDTgEVmNi2r2d8DDzvnZgELgf8XdKC5UXGuLvuI0k/o9lWcK3CCilPFuSInnyP0ucBG59xm51wD8BBwc1YbB9T4y7XAG8GFWCxoMPUKYpsaF1E2lihq8hH0UcD2tPUd/rZ07gLebWY7gGXA37RnyMxuM7MVZrZi37593Qg3w1jP3l9sflScq4su4mpfxblEeAR1UXQR8H3n3GjgBuABM2tj2zl3v3NujnNuTl1dXUCuI0CDuwhRca5wUR/EkXwEfScwJm19tL8tnQ8ADwM45/4AVAFDgghQCCFEfuQj6M8Dk8xsvJlV4F30XJrV5nXgGgAzOxdP0Hs4pyKEEKIrdCrozrkm4HbgUWAtXjbLajO728xu8pv9LfBBM3sJeBB4r3MRXZ2Ja7ZGprMcy2H5iMhPJLVcQtxPsarlEiSq5RJXyvJp5JxbhnexM33bkrTlNcClwYYmhMgfzXmLWN8pquJcRetHxblyE5viXPqBiCMxFvQo0eAuPkLcJxIzNObjiQRdCCESggRdCCESggRdCCESQgIEXcW58vPRgc9A/YScgqniXNlG/f8BznmrOFdsia+gR17LJezBpFouXXMRkv0W4YlrLZegfqh1YTiWxFfQhRBCZCBBzwcdrRQhKs4VLuqDOCJBF0KIhCBBF0KIhBB/QVdxrm74iMiPinOFb7tXFefqQSi9hPgLuhACzXkLiLWgqzhX0fqJ6nOEgopzZdoLxFiAtkRHxFjQhRBCpCNBzwsdYRQfqrYYLgH2gfozMiToQgiREBIg6Amo5ZLhNqKsnUhqxqiWS6a5MGu5BIhqucSWBAi6EELTggLiLOgqzlXEfmKcSaPiXJn2gjEWoC3REfEVdCGEEBlI0PNBV+mLEBXnChdlucQRCboQQiQECboQQiSE+At63ItzRZaGp+JchSeEvmm1E+S0RrEW54rTvi4MMRb0iGu5hO4mqlouRJSBEtMsl7MOQjIbdi2XYrSnOfSoiLGgR4yODnoJcd/PcY9f9AQJuhBCJAQJel7olLH4UHGucFHaYhyRoAshRELIS9DNbL6ZrTOzjWa2OEebd5jZGjNbbWY/DjbMjoh5ca42RbOi8hNFdlDci3O1569IbJ01GoJJFeeKK2WdNTCzUuBe4DpgB/C8mS11zq1JazMJ+DRwqXPukJkNDStgIYQQ7ZPPEfpcYKNzbrNzrgF4CLg5q80HgXudc4cAnHN7gw2zHSIvzhW6I1ScqysuYmw/zOJcRWlPc+hRkY+gjwK2p63v8LelMxmYbGbPmtlyM5vfniEzu83MVpjZin379nUv4oKh071eQdzTU+Mev+gRQV0ULQMmAVcCi4DvmNmA7EbOufudc3Occ3Pq6uoCch0BukpfhKg4V7goyyWO5CPoO4Exaeuj/W3p7ACWOucanXNbgPV4Ai+EECIi8hH054FJZjbezCqAhcDSrDa/wDs6x8yG4E3BbA4wztzEvZZLIWqshErCarkEaj/MWi5BoloucaVTQXfONQG3A48Ca4GHnXOrzexuM7vJb/YocMDM1gBPAp9yzh0IK2ghRDaa1hB5pC0COOeWAcuyti1JW3bAHf5fRKg4V9H6UXGuDszGpThX0RoTHaA7RfNFp3siFmic9mYk6EIIkRAk6HmhU8biI2Y3/sQOpS3GEQm6EEIkhAQIeszT/ApSNCsqPwkozhWk/VD7JsCj4DBSKlWcKxLiK+iJrOUSlZ+Ismlkvz3D4XS/arkI4izoQgghMpCg541O93oFcU9PjXv8okdI0PNBV+l7GdrfynKJJxJ0IYRICPEXdBXnKl4/Ks6VbSx4uyrOJdKIv6ALIdA0kYBYC3rC0gkNFefqso9QHYRkNi7FuZS2GEfyqrYo0OleryHu+znu8QfLsxv38x/LtxU6jDYsmjuWyycH/9Q2CboQIrH858od/HrNHibUVRc6lAyOnGoMxa4EPS90ylh8qDhXuCQnbXF4bRWPfeKKgsYQFTGeQ29BtVyK149quWSaCqNvwu7jALJTekwParn0shmoBAi6EEJnkbnpTSdc8RX0RBbnisJXFH6iyqQJ036IhsOIXVku7dLLDtBjLOhCCCEykKALkU7c01PjHn/AOOewXjQdJUEXQoiEIEHPh950VSU2hLlPtL+DTVsMzlRXcfSur2/8BT20rEUV5+qxnyj6UMW5/IUgVSuM4lw9sNWatajppM6IsaAnrZZLBJkhrX4i8KFaLjnMqpZL1PSiA3TdKdoZZ5qbaTzTRHVVOacamjCMirISnHOUlXq/hzsPn6K81KjrVwnAa7uPMXZQX8pKjYrSEnYcOsVzWw4yvLaK6SNrqO1TTspBicGGPceYHOHneXXnESY2pli3/TBL/3sNE+qq2X7wJPMmDubc4TW8tvso33t2K2+bPYphNVV86mcvMaKmD1efO5Q9R09TYsZ3f7eFmWMGMGpgHz69YCr/8sRGKspKeGD5Nl6rbObMqUa27TzC8LIz/NuytVwwegC7j55mwpBqZoyqZd3uYzy+dg8XTxzM4OoKPvKjlZwztB/XnDsM5xzOwdcfX8/Q/pV87NpJzBk3iM888grnj65l+1Or+GZFhB0WAo78RKahKUVFWcfHXE2pFGV4t5LX5vLnHOYL9Oo3jjBtRA1mxtpdR2lOOaYM70+5P5b3HD3N4JRj/5HTDAe2HjjBgW2HOHiigXkTBrH94Cn2HD3NuSNq+Lv/fJl5EwYzqLoc5+CbT2zgnReO5S0XjODfntnMmaMH+VoXPm8Y9LaDegl6Dlq+IN/67SYuPnwEh7FwyaOB+6mgkfVV8Miqndw6PHDzbfjisrXcV97MqtcP8b3NW1q3f/vpzRntnl6/r3V5+8FTPLf1YMbrL24/zIvbD/PLl3e18fHgc9uZYmdI2Snuz7Kbzvd/v7V1ee+xM/x+04GM17ceOMknfvJS6/ozG/Zzk69vHQlYzwhXAVZtP0zp6SPsc8YHFv+yx/Ym2k6eqIQnXtvLW0vhpn/9HS+7tvukKzxVcYoDlDO8BL7x+AZ+8djvc7dNGyfg/RB//fH1ANRwHKrgxJlm+vUoop5hvWgSPcZTLuHynd95Yrd1/8lI/G3YczwSP0nhZENzoUPoMo3NKY6FVJSpmGlKpQrmu5cdoEvQc5H+xXMJmIXbvP9EoUMIlJKSEIduSLt77a6j4RgOgSDHfKG/P/H/9uZP7KZcTjU0s/qNIzy9bhd3AP/02Gv8y696fuoKMHpgH3YcOgVAve2FykDMdoilHUOcbGimtKmZ8pTj4MkG+lWWUVVeSirlMIOmlKOsxDjZ0IwDDp1ooLKsBAwG9a2grLSExuYUx083MbC6Aucch082UtOnnM/8/GUeimDu2XIsB4fXXweOn2EYsP/4GV5+bQ/VFWWcP3oADU0pjpxq5OjpRiYN68e63cc4frqJw6caGTOwL4dPNTBzzADK/Wsbdf0qqSwv4dmN+5k1diCp42cY5GDnoVOMAVZtP8iavWe4dOIQykqNqvJS1u0+xoyRtazfe4zaPuU8vX4fM8cM4ERDMzPHDKC0xHjj8CkGV1dQVV7KqtcPc87QfjSnzu5rK+Jjx/TYehJn9v5PpVKUlJYC3pRZWYnXn9sPnqRPRSm1fcox8/p+/JBqzjSlON3YTI1LUYJ3hjNp8S+5YHQtf3rBSJ7ZsJ+Bfcv5xYtvUFFWwgWja3l+6yEASkustb8rO7kOkSRiJ+jv+ffneG7LQUpp5o6qYG23iHmheHLdXibbSa77zLLAbc9L2Jj+5yc28O0KWPJfr7IsFdwM7efK3uCtpY08+Nzr3FkO7/z2chooD8T2IyH+oBbvz8NZJn52Ga4bkwIvVzZRY5Dyr3C+tOMIL+04ktGmoSnVKuZAxo/nmabCTflETV69a2bzzWydmW00s8UdtHubmTkzmxNciJnMn+5dOYzqNC66bHSL5DNF4ccRxf6Jp/3w+j9Ym8FOuYio6FTQzawUuBdYAEwDFpnZtHba9Qc+Bvwx6CDT6VNRGqZ5IYSILfkcoc8FNjrnNjvnGoCHgJvbafcF4CvA6QDja0NLvmzUFPOcpxAtaJz2bvJRx1HA9rT1Hf62VsxsNjDGOdfh1Ukzu83MVpjZin379nXUNCflpb3pmrUQQuRPjw93zawE+Brwt521dc7d75yb45ybU1fXvSdeZx+hhyXv6Xadi+5HJKwjrGy74R3JhZvJ0WIzjAomLaTPHwf5GYLKHsllMyiM9P7tSZZLy76yVrvdjaensfQW8hH0ncCYtPXR/rYW+gMzgN+a2VZgHrA0rAujhZpyEaKYKXSutygO8lHH54FJZjbezCqAhcDSlhedc0ecc0Occ/XOuXpgOXCTc25FGAG3TLlEmX0SjR+I4hYI56LIconGRxztu5COM4OuWZKkG4t6E50KunOuCbgdeBRYCzzsnFttZneb2U1hB5hNhY7QhRCiXfK6scg5twxYlrVtSY62V/Y8rNyU96K7vkT0xH2eVsfCvZvYqWNpiYasEEK0R+wEvRByHu9jtmQS5rys5nx1p2hciZ+gZ9U2jirNLyyiSyfs2G8YdsNKqcvlr/hJ75tgCKWIgAUTZ1Bj2yJLGYg/8RP0QgcgRBGiswoBMRT0Y6ebgCgHcFRpi9EcazrCPwWOykcc7Yd1rFncgl7MsSWL2Al6c297SKAQQuRJ7AS9UEku6cfPE4ZU86VbzytMICGw6Us3sGBG5gNNZ44ZEKiP1Z+/nv9z0diMbcX44IFCzMkvXjA1MFst8f/u764KzCbAPW8/v3W5pip2j1HoNRTfN6oToj5AHzu4T5ttv/nklW3EaeuXb+zQzpK3tKk4XBAGVXtPWZg1diAAf3LOEEpLjL+6cmJrm7V3z+dj107KeN8lEwcDMLi6gmcXX53TfnlpCW+dPRqAAX08XxWlJVRXlvG+S+pb271y15v53J9Oz3jvNVOH8ldXTuQt54/gVx+7rJufMH5cP/3sj+mbxg0MxObogX1bl989b2wHLdsyoG8FVWWZZaqnDOvfuvzMnZn7/8opdUwd3h9ReGL3U5vKUvT0K/JB0nKks/DCsZS82I9hfUv48VUX0ZD29JMf/eVF7D12mhvOGwHAzz9yCateP4wBC+eO4fiZJk6eaWZA33Jq+5QzckAVx0438dt1+zh2pokVWw/ytVsuSCukEB4tn+ddF41j4MoKBo2qYf37F7Tm9Z8/egCvfWE+VeXeF/mqKUN57QvzAe9IOju7aNOXbqAplcI57+kwjc0p+laUYV9yjKipom7KUMqO72L9pxbgfN+ThvVn9eevp7rSG3aL5o7h1lmjSDlH34rSNj5e+8J8SkuM46ebMPP8PPGT1bA9veBT8Ps/3WKwxbnOMrR/BRyEby6axbCas886/OH753LZPU9y8EQDAO+9pJ5bZo1i2Su7mD12AB/+j5VZNr34Lps0BLbAx6+dTGqUV0bpi7fOoDnlWDR3LDNG1rL6jaMMq6nkxvNHUtunnMbmFL/ftJ8H/rCNla8fBrzSGtUVJcyoGwCvw5dunc7nznszNVXlLP/0NQyrqcTM2PrlG9l95DTDa88+NqyxOcWhEw0cOtnIsdONjKk8Cd/u+b46m+Wi6dbOiL2gh40ZjKytguZGLpk4JOO1S8/JXJ89diCzx549wupbUeaVLvOZP8MT/j+bk1brrOFEJILewqXnDMZ8TajImvJoEfNc6+mUlhilJblfL/N/KLJ9tIg5eCmoHT2wpMX/wOr0Z7fF/wLbzDEDoKqGrXeePatLP8Nb+Q/Xtf8e2jkT3LMGvgUjB3hnkldMroMxQwHvx7uFhXPbP0q/ddZobp01OnPj1++k1P9xrSwrpbLKewRfuni3t15eWsLQmiqG1vjbT/SeR78VC7Gbckm1jpGkZblEk6lQUlLi/UqFiRmhPcLNZZZkDYswi3OF0jdB79MA7em4OjriJ+gRH6Erqaa4SMV8f2SfsQgRJLEbXVF/oWOuH4lDP7BC5CZ2gu4K9Y2WkhQFLuSf2NAnvUIfRhqnvZn4CXrWevoXcHhNFW+bnXWBp4eMqK0i1K95dtZOVLVpwvqByrAbgo/WOXSPuD2Crqz1mbhB2Q2vj3tsP+t6R6EeQXdhfTCpoHEgdoJ+7bnDeNdF7V+xX/6Za/jirTO6ZK9+cN8OXy9Tud6iom9F+IlZ/UO8cSboG7bOonGaix9/cF6hQ4iM2Al6RVkJX7z1PL7xzpkAvOV8LxXwMzd4d9ulP9Ho2cVX8+Qnr2Rcmmh/8LLxLF4wlQ9dMYH73j07584e2NdL1Qo9I8TnwvGD282seDDwwRheBkqGj5D67UZ/f7f3GUYNaHsTWHeoKC1p91hwSL/KdrbmT3VFGVXlZeH0TVFnuQQbW+t3sx2G9Ktos603PYc4dnnoLdwyaxT8F0ys65eRm1tSYm1ydZ/6VMe3Qbd7l+e+dXBvIKHmxdTh/Vm5LXNbS1xzxw/iuS0HGV5Txe6jp5kyrD/r9hzLaPvueWM5d0QN8yYM5pp/eqqN/VED+sDJ0MKPjI4eQdinopSPXjOJbz6xIWeblz73Zi74/GM5X19w3nDY1v5rP/3wxVz1f3+b873P3HkVV3z1yZwX7uuHdHw22Bt48IPzWPSd5W22l5can79pBp955JU2r/WpKIVG77v91KeuZGj/Kg6ebGj9Ad+07zgThlRjZjQ2p3jj8CmaUo6t+09w1ZShoX+mYiK2gp40Rg/sy5mRNYxt6Ms1tUP52zdPaX3t4Q9dDMCRU42s3HaIq6Z6g7SxOcWkz/6Ku2+ezl9cXN/a/hvvnEnKudZb8AHY9CQ8EMlHiZShNZUMbqzgwIkGnHPccd1k7rhuMvWLf9na5u6bp7Pkv1YDUNunnMsmDeGZDftbX8/4Qf/lb3L6Gj+kmq1fvpGP/OgFlr2yu83rw2ureMecMTz0/Pb2bX/7nu58xERx8cTBrX0ybcn/cLKhmY9efQ53+OM9XdBb++6L3j8Dxg2uBmBUxdmzsYl1/VqXy0tLWtukb+8tSNDzJvzsinPq+sEbxnffe2G7bWr7lLeKOXiDt72zi1tmjQorzKLh+unDYQP8/Y3TWDhsHtd+7emM13/9icu57utPM25wX/7i4npmjx3IjkOnMtr84P1zvTsr0/Ev5F0zdRhs8ura1C/5bUaT2y6f2Cro772knkVzxzIlq5bJP771PBbluDszVGKUjfXzj1zCsld2c8d1kwsdSmLoPZNLIlGki+Wgam9ue35axchJw/qz9cs3tk63zRhV2/r67VedQ5/yUi4YXZvT/kUTBrUuP/aJy/nnhTNb12eOGcA7/fINt10+IUPMr5zi/UCcNyq37d7Kyn+4NmN96vCaNmL+0w97Z6O6Aat7xP8IPYr0u1AvjGbHH+6jFSL1E4qLrIRF5xhUXcGLS66jpir3xbJ0LpowmLV+4bH2SdvfzjF5WH8mD8s8Ar/7lun8+cXjWmuotDB/xgjW3H19jmyc9L4JqHNCGf+O9P7tmZ2zn7p/Zedyc2H9INbePT/zK9cSQ4zOPgpF/AVdCLySr1FSWVbKjBxH4VGkVraleNMW+1eWQUP+7Tsq2CY6JubnNVEM4oi+KBZeql/kfkIsznXWR7jmQ3MQVt8Ucdpi/6pof2x7MzEXdCGEEC1I0IUQIiFI0PNFF2R6CXHfz3GPX/SEBAh6FFkh0RXniqZoVlR+wiwcFXSRqzQyUyyCsxtK34TVxwH0b8vntZ5mzLis/yIXCRB0IUQxZ7mI6Ii3oEeVFRIJURTNispPVJk0cbQfUt8UcZaLfmyiIy9BN7P5ZrbOzDaa2eJ2Xr/DzNaY2ctm9oSZjWvPjhBCiPDoVNDNrBSv7uACYBqwyMymZTVbBcxxzp0P/AxQFSIhhIiYfI7Q5wIbnXObnXMNwEPAzekNnHNPOudairMuB4J9bFBRoAsyvYK4ZzPFPX7RI/IR9FHA9rT1Hf62XHwA+FV7L5jZbWa2wsxW7Nu3L/8ohRBCdEqgF0XN7N3AHOCr7b3unLvfOTfHOTenrq6uvSZdR8W5itdPaIWjIJjiUbnILM4VHGEW5wpyjAZbnOtsaN20pbOOvMmnitBOYEza+mh/WwZmdi3wWeAK59yZYMLrDNVyKUo/icg+Ui2XAI0FaEt0RD5H6M8Dk8xsvJlVAAuBpekNzGwW8G3gJufc3uDDFEII0RmdCrpzrgm4HXgUWAs87JxbbWZ3m9lNfrOvAv2An5rZi2a2NIc5IYQQIZFX4Wbn3DJgWda2JWnL17Z5kxBCiEiJ952iUaILM72EuO/nuMcvekICBD2qrJCw3ERVNKsTv5E4DcJkVNk6Idh3OVeCMhqQyYCKiAU2tsPOnEoOCRD0KNBV+uIjxH0SWf2eIAk65gDtxbI/40m8BT0R6XGtjlBxrq64iKt9FecS4RFvQRdCCNGKBF0IIRKCBD1vdDGmVxD3i25xj1/0iPgLelSPbAuNBNVyiSRjJ6KsoFDsh1nLJUiCijOgMeeU5ZIv8Rf0KNBV+iIkzH0Sx/2tLBcRe0FXca6i9BNWAao2PkJ1EJJZFecS4RFzQRdCCNGCBF0IIRKCBF0IIRKCBD1fdHW9lxD3/Rz3+EVPSICgJ604V1R+Ylo4q80j1+JUnCuCR/IFYjLtEXRBFOdqucAaRHEu/WB1SLwFPbJaLkmpsRKVn5jXcgk7E0i1XERIxFvQhRBCtCJBF0KIhCBBF0KIhCBBF0KIhBB/QY+sOFehinWFZTeuhbNC9hFqNlAExbkCsaviXHEl/oIeBSouVISoOFe4qDhXHIm5oKs4V9f9oOJc+TkIyayKc4nwiLmgCyGEaEGCLoQQCUGCLoQQCSEBgh5RVkhU2TSR+YnARxSPuQvcR4hZFBmxB+UnhP4IKs7AxrZqueRLAgRd9E50oS2TIu4PZblERrwFXcW5itSPinN1YFzFuQpmI/nEW9CFEEK0IkEXQoiEkJegm9l8M1tnZhvNbHE7r1ea2U/81/9oZiuakY0AAAf6SURBVPVBByqEEKJjOhV0MysF7gUWANOARWY2LavZB4BDzrlzgK8DXwk6UCGEEB1TlkebucBG59xmADN7CLgZWJPW5mbgLn/5Z8C/mpk5F0ElnZU/hPWPBm+38VTm+t61cO9FwftJNWWuH9sVjp8zxzPX1/437FwRrI/s3X1wS/Cf5dQhf8G/SPbEF+D3/xKc/aO7oLLf2fXvXgcl+XxN8uDgZhgyyVveuTKYvmkdp35//OIjUFHdM5unDp6199Q98Nx3umenucEPzbf1nauhpLTrdlwKrMQbX9+6NBlZM1fcCTPeFrjZfEbqKGB72voOIHsktrZxzjWZ2RFgMLA/vZGZ3QbcBjB27NhuhpzG5Z+E3a/03E4uxl0Co+cCBuVV4fkZMRPOuQaGngsNxwkt17bqas/HpR+Fbc+G42PYdJh6A4ya5X0Rw6D/CBg5C+beBsf3BGu7bgqMvQTOuRZmvB1SjcHanvXncOYY9B0UnN1xl8Kc98HJ/Z7tnlI3FS76ENSOhqM7e2Zr1ByYuQhW/aj7fTl0GkyeDxsebXsAFFeqBoRi1jo7iDaztwPznXN/6a//OXCRc+72tDav+m12+Oub/Db727MJMGfOHLdiRcBHiEIIkXDM7AXn3Jz2XsvnouhOYEza+mh/W7ttzKwMqAUOdD1UIYQQ3SUfQX8emGRm482sAlgILM1qsxR4j7/8duA3kcyfCyGEaKXTOXR/Tvx24FGgFPiec261md0NrHDOLQW+CzxgZhuBg3iiL4QQIkLyunzvnFsGLMvatiRt+TTwZ8GGJoQQoivoTlEhhEgIEnQhhEgIEnQhhEgIEnQhhEgInd5YFJpjs33Atm6+fQhZd6EWCYqraxRjXMUYEyiurpLkuMY55+rae6Fggt4TzGxFrjulConi6hrFGFcxxgSKq6v01rg05SKEEAlBgi6EEAkhroJ+f6EDyIHi6hrFGFcxxgSKq6v0yrhiOYcuhBCiLXE9QhdCCJGFBF0IIRJC7AS9swdWB+xrjJk9aWZrzGy1mX3M336Xme00sxf9vxvS3vNpP7Z1ZnZ9WHGb2VYze8X3v8LfNsjMfm1mG/z/A/3tZmbf9H2/bGaz0+y8x2+/wczek8tfnjFNSeuTF83sqJl9vBD9ZWbfM7O9/sNXWrYF1j9m9ia//zf6783ruWg54vqqmb3m+37EzAb42+vN7FRav93Xmf9cn7EbMQW2z8wrvf1Hf/tPzCvD3d2++klaTFvN7MUo+8p/Xy5dKPj4wjkXmz+88r2bgAlABfASMC1EfyOA2f5yf2A93oOy7wI+2U77aX5MlcB4P9bSMOIGtgJDsrbdAyz2lxcDX/GXbwB+hfegyHnAH/3tg4DN/v+B/vLAAPfVbmBcIfoLuByYDbwaRv8Az/ltzX/vgh7E9WagzF/+Slpc9entsuy06z/XZ+xGTIHtM+BhYKG/fB/wV93tq6zX/wlYEmVf+W1z6ULBx1fcjtBbH1jtnGsAWh5YHQrOuV3OuZX+8jFgLd7zU3NxM/CQc+6Mc24LsNGPOaq4bwZ+4C//ALglbfsPncdyYICZjQCuB37tnDvonDsE/BqYH1As1wCbnHMd3Q0cWn85557Gq82f7a/H/eO/VuOcW+68b98P02x1OS7n3GPOuZaHZS7HeypYTjrxn+szdimmDujSPvOPLK/Ge3h83jF1Fpdv9x3Agx3ZCLqv/Lhy6ULBx1fcBL29B1Z3JLCBYWb1wCzgj/6m2/3Tp++lnarlii+MuB3wmJm9YN7DtwGGOed2+cu7gWEFiKuFhWR+2QrdXxBc/4zyl4OOD+D9eEdkLYw3s1Vm9pSZXZYWby7/uT5jdwhinw0GDqf9YAXVV5cBe5xzG9K2Rd5XWbpQ8PEVN0EvCGbWD/hP4OPOuaPAt4CJwExgF96pX9T8iXNuNrAA+Gszuzz9Rf+XvSA5qf4c6U3AT/1NxdBfGRSyf3JhZp8FmoAf+Zt2AWOdc7OAO4Afm1lNvvZ6+BmLbp9lsYjMA4bI+6odXeiRvSCIm6Dn88DqQDGzcryd9iPn3M8BnHN7nHPNzrkU8B28082O4gs8bufcTv//XuARP4Y9/ulay6nm3qjj8lkArHTO7fFjLHh/+QTVPzvJnBbpcXxm9l7gLcC7fDHAn9Y44C+/gDdHPbkT/7k+Y5cIcJ8dwJtiKMva3m18W28FfpIWb6R91Z4udGAvuvGVz0R7sfzhPTJvM97FmJYLL9ND9Gd481ffyNo+Im35E3hzigDTybxgtBnvYlGgcQPVQP+05d/jzX1/lcyLMvf4yzeSeVHmOXf2oswWvAsyA/3lQQH020PA+wrdX2RdKAuyf2h70eqGHsQ1H1gD1GW1qwNK/eUJeF/qDv3n+ozdiCmwfYZ3ppZ+UfQj3e2rtP56qoB9lUsXCj6+QhHCMP/wrhivx/sF/mzIvv4E77TpZeBF/+8G4AHgFX/70qzB/1k/tnWkXZkOMm5/wL7k/61usYc3X/kEsAF4PG1wGHCv7/sVYE6arffjXdjaSJoI9yC2aryjstq0bZH3F97p+C6gEW8O8gNB9g8wB3jVf8+/4t913c24NuLNpbaMsfv8tm/z9++LwErgTzvzn+szdiOmwPaZP16f8z/nT4HK7vaVv/37wIez2kbSV53oQsHHl279F0KIhBC3OXQhhBA5kKALIURCkKALIURCkKALIURCkKALIURCkKALIURCkKALIURC+F+etoyFYhc0zwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 681/681 [05:42<00:00, 1.99it/s, cost=0.54] \n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch 10, training avg cost 0.538868\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3deZxcVZn/8c/TXb1k6azdWcjWgSSQBUwgJtGwKWAWlIgwCgMOMmhGZ1AcXH5xVGRgdEZ4OfobB0UcHZVRER2X6C+ICwqjDEtCICEJISEBspGELJ2tO72d3x91u1O9VHd1172369z+vl+vvHKr6vZzTp17++lb9556rjnnEBER/xX1dQdERCQcSugiIgmhhC4ikhBK6CIiCaGELiKSEKm+ariystJVV1f3VfMiIl5as2bN6865qs5e67OEXl1dzerVq/uqeRERL5nZK9le0ykXEZGEUEIXEUkIJXQRkYRQQhcRSQgldBGRhOg2oZvZt81sn5k9n+V1M7N/M7OtZrbOzM4Nv5siItKdXI7QvwMs7uL1JcDU4N9y4Ov5d0tERHqq23nozrnHzKy6i1WWAd9z6Tq8T5jZMDMb65zbE1If23rlf+GlR9LLxSVw3o0wuNM59vlb812o2ZlePutyOG12NO08/1PYtym9fMZbYNKbw2+juRmevBdqD0FRMcy+DoZNCL+dF1bB7rXp5WmLYPzccOM31KbfR/0JKB0E8z8IJeXhxa/ZCWv/C5qbYNR0mPWu8GLvXAMv/jq9PPOdMHpmOHFX/ycc2Q2DKmHecjDLL972/4Htj0HpQJj3N+n/e2vdg/D6FhgzC2Ys612MF38DO59O//6ddXnv+9IPhPHFonHAjozHO4PnOiR0M1tO+iieiRMn9q61nU/BY3cDQR33gSPhjTf1LlZXag/DLz9y6vH+F+A994ffDsDKD0P9sfTyK3+GG1eF38brm+HhT516bEVw0SfDb2fVx+HIrvTy7rVw/U/Cjf/q/8Lvbj/1eNx5MPmC8OI/+wP44z+nl0srwk3oj911KqHX7IQrQ/gwe/wA/Oqjpx5PWwTDq/OL+bvPwa416eXRZ8PUS3sf62d/A64ZBgzvfUJ/6JNwaDtUnKaE3o1YL4o65+5zzs11zs2tqurlUfXCW+D2w/CJl4KgzeF1MFNL3CV3wehZ0bUD6aPBN38EJl+UXo6qDYB339/2cRTtnHsDjH8juAjaaA62w2V3pv8Pu42WcXnzh6OJfdq56YQbVuyWOBPmn2ojX81NMGhU2/i91fJ705zH709LH6LYnxImjIS+C8j87D4+eE5ERGIURkJfCfxVMNtlAVAT2flzERHJqttz6Gb2Q+BioNLMdgKfA0oAnHP3AquApcBW4ARwY1SdFRGR7HKZ5XJtN6874O9C65GIiPSK/98UdS6euFG1kw6eZTmqNmJqJ5Ixi3q7RNn/CGJHPcb5xHch7dctPxrp72Ay+J/QpX/Kd6514oQ8HhpfL3mc0OPa4Syetszi+SWKrY2o2/E0flRjE/p2VUL3kccJXUREMimhi4gkhBK6iEhCJCChxzUrJEIu6pkhncSNpZ0I2ujQ77hmH4URLoqxiXqMQ5rlktf+5tr9L9kkIKHHRTtTv+Dt1LjgIqa3/Zcw+JvQ45pWFdv0rZhm08TSTgwzdryNH9HYhB1T0xa95G9Cj5P27QIU5UbRBtcY+EkJXUQkIZTQRUQSQgldRCQh/E/ovk/zSwfPshxVGzG1o+Jc0ceOZBpn2NMN2y/3NIwLoS/9g/8JXfonzcJoR7NcRAk9RwkrzhXLlD9Pi2dBtP1XcS6JkBK6iEhCKKGLiCSEErqISEIkIKGrOFeP24itHRXnahvOl+JcIcVXca7YJSChx0U7U7/g+9Q43/svefE3oas4VwG3o+JcXQRWcS6JjL8JPU7auQuQinNFS2PgIyV0EZGEUEIXEUkIJXQRkYTwP6HHNc0vUirO1av4nTwMtymfinOFec5bxbl85XFCj3P2CdHvTKrl0os2ohBFgswQeS2XkPZTXRP1kscJPU7au/sVzWpC+7yflNBFRBJCCV1EJCFySuhmttjMNpvZVjNb0cnrE83sD2a21szWmdnS8LsqIiJd6Tahm1kxcA+wBJgBXGtmM9qt9hngQefcHOAa4GthdzQ7FefqcRtxthN5fBXnCl1Y/VRxrtjlcoQ+D9jqnNvmnKsHHgCWtVvHAUOC5aHA7vC6WCi0M/UL3k6Ni2k2lhS0XBL6OGBHxuOdwXOZbgeuN7OdwCrgw50FMrPlZrbazFbv37+/F91tEyy/ny+0dlScq4dN+BpfxbkkOmFdFL0W+I5zbjywFLjfzDrEds7d55yb65ybW1VVFVLTMdDOXYBUnCtaGgMf5ZLQdwETMh6PD57LdBPwIIBz7n+BcqAyjA6KiEhucknoTwNTzWyymZWSvui5st06rwKXAJjZdNIJPc9zKiIi0hPdJnTnXCNwM/AwsIn0bJYNZnaHmV0RrPYx4ANm9hzwQ+B9zsV0dcbX2RptG8uyHFUbMbUTSy2XCLeTV7VcwqRaLr5K5bKSc24V6Yudmc/dlrG8EVgYbtdEJHc65y1ef1NUxbkKth0V58rOm+Jc+gPhI48Tepy0cxeeCLeJkhna5/2khC4ikhBK6CIiCaGELiKSEAlI6CrOlVsbXbQZajsRT8FUca72QYP/QzznreJc3vI3ocdeyyXqnUm1XHrWRETxWxKPr7VcwvpDrQvDXvI3oYuISBtK6LnQ0UoBUnGuaGkMfKSELiKSEEroIiIJ4X9CV3GuXrQRUzsqzhV97H5VnCuPrvQT/id0EUHnvAW8TugqzlWw7cT1PiKh4lxt44USLMRY0hWPE7qIiGRSQs+JjjAKj6otRivEMdB4xkYJXUQkIRKQ0BNQy6VNszHN2omlZoxqubQNF2UtlxCplou3EpDQRUSnBQV8TugqzlXA7Xg8k0bFudrGCydYiLGkK/4mdBERaUMJPRe6Sl+AVJwrWprl4iMldBGRhFBCFxFJCP8Tuu/FuWKbhqfiXH0vgrFpjRPmaY1CLc7l07buGx4n9JhruUTeTFy1XIhpBoqns1xONRBR2KhruRRiPJ1Dj4vHCT1mOjroJ3zfzr73X/KhhC4ikhBK6DnRR8bCo+Jc0dK0RR8poYuIJEROCd3MFpvZZjPbamYrsqzzbjPbaGYbzOwH4XazK54X5+pQNCuuduKYHeR7ca7O2iuQWKeCRhBSxbl8lepuBTMrBu4BLgN2Ak+b2Urn3MaMdaYCnwIWOucOmdmoqDosIiKdy+UIfR6w1Tm3zTlXDzwALGu3zgeAe5xzhwCcc/vC7WYnYi/OFXlDqDhXT5rwOH6UxbkKMp7Ooccll4Q+DtiR8Xhn8FymacA0M/uzmT1hZos7C2Rmy81stZmt3r9/f+963Gf0ca9f8H16qu/9l7yEdVE0BUwFLgauBb5pZsPar+Scu885N9c5N7eqqiqkpmOgq/QFSMW5oqVZLj7KJaHvAiZkPB4fPJdpJ7DSOdfgnNsOvEg6wYuISExySehPA1PNbLKZlQLXACvbrfNz0kfnmFkl6VMw20LsZ3a+13LpixorkUpYLZdQ40dZyyVMquXiq24TunOuEbgZeBjYBDzonNtgZneY2RXBag8DB8xsI/AH4BPOuQNRdVpE2tNpDclh2iKAc24VsKrdc7dlLDvg1uBfTFScq2DbUXGuLsL6UpyrYINJF/RN0Vzp4554Qftpf6aELiKSEEroOdFHxsLj2Rd/vKNpiz5SQhcRSYgEJHTPp/n1SdGsuNpJQHGuMONHOjYhHgVHMaVSxbli4W9CT2Qtl7jaiWk2jeJ3Fjia4VctF8HnhC4iIm0ooedMH/f6Bd+np/ref8mLEnoudJW+n9H21iwXPymhi4gkhP8JXcW5CrcdFedqHyz8uCrOJRn8T+gigk4TCXid0BM2ndBQca4etxFpAxGF9aU4l6Yt+iinaouCPu71G75vZ9/7H641rxzkW3/aXnC/vtfOm8iF08K/a5sSuogk1i+f28Ovn3+NKaMG93VX2qipbYgkrhJ6TvSRsfCoOFe0kjNtsaK8hN/8/UV92oe4eHwOvYVquRRuO6rl0jZUFGMT9RiHMDslb72v5eKc6+u/J7FKQEIXEX2K7Jyjf42Mvwk9kcW54mgrjnbimkkTZfwIA0fRd81y6ZRzYP3oEN3fhC4i0o1m53SELtJvFdr8tp7yvf8hc/T5NdlYKaGLSGKl/771n4yuhJ6L/vQn3htRbhNt73CnLYYXquc0y8Uvkc1aVHGuvNuJYwxVnCtYCDNrRVGcK49YrbMWezNtsX/9efY4oSetlksMM0Na24mhDdVyyRJWtVzilJ7l0mfNx07fFO3GyaYmGk42Mri8lBf3HqWiPMXYoQOob2xmy76jDCkvYciAEoaUp2hqdhQXGQ1NjiN1DVQOLmsTq6GpGQNSxUX8fO0upo4ezBnDSyiP8f08v6uGKY3NHKqp4913PcI1b5zItNEVNDU388H/eoa7rjqHgWXFfPm3L3Lnslls2XeMVLExdVQFB46dZPzwgTQ7x7GTjfx87S4uOrOKm3+wlrPHDWXxrDF8yDmO1zWy87UjTB7YzHvvfZyFUyqZPWEYx082seKn6/js22dQObiUr/xuC/94xUye31UDwJyJw9l7pI7qykEcPlHPS/uPc6S2gcFlKVb8dD2fe8cMpu7dz/kxjlcUopgbXVPbwFBg37GTDDzZyKDSYo6dbGRASTGp4tyO2361bjdLZ42lCHitpo4xwO827WXzwa0snFLJht01XDi1iie3H6SiPJ06Jo0ciHNQPXIQOw6d4PldNRQXGbc88Cz/ec1U3hLy++wph8P60TG6EnoWR+oaGAJ849FtzD9Ug8O45suPhd5OKQ28WA67DtcyLvToHX1+1SbuLWni18/vYUdjLXc/vLnN65/873Wty3/5H092G+/Ha3YCsH5XDet31XBTWTPff/JVzrQ6Th45ytP1h3j65UNt2/jJqTau/NrjOff9H3+5kSuKdnJ+afANwJx/sieiPYWzdsdhiutq2O+Mm1b8v7zjnWG7+H0Z/P6FfbyrGN7/nadZ5w7m00MeLT3BAVKMKYJfrdvDz5s3d9hPcnHLA2tZVw6Nza7PEk1/O0L3+JRLtFqS0Lb9x2Np7w+b98fSTlL4ODmvoamZoxEVZSpk9U3Nfda2vikqAJSmTg1NHF9NiLqFnYdORNxCvCL99l9EoTftORpN4AiEu8/39Tn0/pPSvTvlUlvfxIbdNTy2eQ+3Al/6zQt89aH8P7q2V217+GNZ9+vly4JjzSKD2oYmBjhHfWMzr9XUUVGe4sDx9HnrkuIituw7ypmjK3h+1xGmj62guMg4UttIfVMzxUXGiEGlrXGbmh1Hahswg0FlKR58ege3Rv922vzqRnlKZP/RekYBNbX1lNQ3svtwLbsP11E9chBlJUU8unk/E0cOpKGpmbPHDWXD7iMMLktR29DEvOoR1Dc1U5Yq4khdI2teOcioinJONjYz+tAJxpH+AzgBOHziJHuO11FT20BdQxNzq0fw3cdf5qJpVdQ1NDHjtCGs31lDUZFx/GQj50+ppL6pmYGlKXYfrmX1K4c4vXIQR+saKW8+daRqBfwZI7Nv+fSzZfufuoNcM3UN8PKB40yuHERZqoia2gaGlJfQ7Bx7auqYMGIgAHUNTdQ1NFHf2EwV6WtP9U1NvP3Lj/Li3mOUFhe1OfIfUp6ivKSY+aePZPXLB9lTU9f6WnlJ/zlu9S6h3/CfT/HU9oMU08StcV5NjNiOg7U8cngf0+wYl33modDjLyg6CKXdr+eLz/7ieb5RCp/66XpW/WRQaHE/l9rLu4ob+OFTO/hkCcz7wu+pp6TDer05p/yzCMe/cP88pE+5DDI4/R9W4XpxUmBdWSNDLP2H4cW9x1pjZjpS18iRukZ++dzuDj9f19B3p3ziltPomtliM9tsZlvNbEUX611lZs7M5obXxbYWzxwDxHMaJN1OPBwWy3uKox1HHNvHz/jRjX+4McPsYyH/sUmabhO6mRUD9wBLgBnAtWY2o5P1KoBbgO6nRuRhQGlxlOFFRLyVyxH6PGCrc26bc64eeABY1sl6dwJfBOo6eS00JTnOqQ1bIZ/zFGmh/bR/yyU7jgN2ZDzeGTzXyszOBSY457q8Omlmy81stZmt3r+/d9P0Sor7zxVrEZGeyPtw18yKgH8FPtbdus65+5xzc51zc6uqenfH6/ZH6FGl98y4zsX3RySqI6z2caM7kgtnhkQ2LTGjqGDSIvP8cZjvIazZI9lihsXIHN98Zrm0bCtrjdvb/uTbl/4il4S+C5iQ8Xh88FyLCmAW8EczexlYAKyM6sJoX51yESlk/es2DpJNLtnxaWCqmU02s1LgGmBly4vOuRrnXKVzrto5Vw08AVzhnFsdRYdbTrnEOfsknnYgji9gOBfHLJd42vAxvovoODPsopPhznLRH5u4dJvQnXONwM3Aw8Am4EHn3AYzu8PMroi6g+2V6ghdRKRTOX2xyDm3CljV7rnbsqx7cf7dyq4kpYQu0fH9PK2Ohfs377JjcZF2WRGRzniX0Psinft9zJZMUZ6X1TlffVPUV/4l9HaV0+Ka5heV+KYTdt1uFHGjmlKXrb3Clzk24YikiICF08+w9m2LbcqA//xL6H3dAZECpE8VAh4m9KN1jUCcO3Bc0xbjOdZ0RP8ROK42fIwf1bFmYSf0Qu5bsniX0JviuJO8iIiHvEvofTXJxa9ztdJbvm9n3/sv+fEuoRfCAfq86hFZX/vvD70pxp703BmjBgNQHFxcfvijF/K1687tdN1/ftfZvOMNp2WNtXDKSACuaLfOuGED2jz+zOXT+cH753ca466rz2lzp6VsJowY0O06Eo2bzp+c87qXnDWKLZ9fwpVz4rjlubTn3R2Lmttl9Mwr8mFqOdJZcPpIJh4bROUA4wuzz2bp2WMYOqCENa8c4s9bD7Te4q2qooyhA0o4b9II/umds/jCqk08dMsFPLh6B/+z5XXuvvoNPP7S61x13ni+/seXuPWyaXz/iVeYPjKVLkgcsZb3c/W54xn+RCnXzZzI1YsWU15SzJljKtj2haXUNjRRmiqiqTm9blmqiPfMncCdy2YyqCxFc3B7vLJUMc3OUZYq4mRjM+Ulxdz9F+dQbEbx541ls8fR9NphUsea2HDjIgaVpXezbV9Ymr4VW2m6DefSY/euOePYf+wkoyrKW287Nrg8RX0Qu76xmQGlxRw/2cgvvrsBdmcWfAp/+2dGDLc41ylVFaUMPFbMc597G0v+7/+wdd+xNus+8rGLeOuXHgXg+gUT+a8nXgVovfXa0rPHsGr9a639e8OEobAH/nL+RO5fsoiDx+q541cbqaooY+ZpQzh8op4PXTyFZff8ievnT2Lk4DL+4Wfred+bq7lk+ihSRUbl4DKO1jWS+qoxa8ww2AO3vWM6QxZMZ8WSs9h5qJZUkVFRnqKoyGhqchyta2TssHJeOXCCwWUpRgwqpaS4iC+/ZzZ1i8bCV/LfVqdmuRTA0VyB8z6hRy1VbIwfPgCaGvjL+RNbn59bPYK5WY7Ur18wiesXTALgE4vO4hOL0s+fOaYCgP+z+CwA3rdwMtQfj7D3HZkFicWgvOTUzUKKiqw18WY8jRkMG3jqCLos1fYGIy0x2j+fCs6NtcRsaWPowI63c0sVFzF26IDW9QcF93JtKcTWclOTQWUpUgkonzx7wjAoH8LG9y4G4He3XtTpei//y+Wty//0zrM7D7Z3I3wdzhozBPbAX5w3AcpSDC5L8R83dKyP96sPX9C6fNmM0R1eHzawFIaUQ2l6uw0tLwEzSoqNyZUdb/U3PPh0NSX45Jcpc/+SeHh3yuXUfXbj+cUuiumO4fHcti04+oz6PZkR2S3cXNuSrFGJsjhXJGMT9jYNMZ6Oq+PjX0KP+yS6/weEbZjnb6hZ2UEkKw8TerztxXWEHhvP304hXBQXKVTeJXQX8290a/5TJikILuIP8J7/vUMnOPo3/xJ6u8dR/wJaUXTng4EOfyhiq00T1R+oNnEjaKP1HHqar7egC29sohvjvOO3u96hW9BFz7uEfun00VyXMdskaqmElev1/d0MLPVuYlZMfN+yEgbvEnppqojPX3k2X3nPbAAunFrJ6CFlrLv9bXz+ylkMaDdVauSgUj522TTWfvay1ucumFrZuvzBi85g3LAB3LlsJgBffs8b2vz8gtMricMbJ4+Mpx6HFRHD55rIZtJcfs7YU21EKLpaLhGNTUHPcolmW2U71tp4xyJ++IEFkbRZ6Lw93HnnnHHwC5gzcThP3nApANfNn8R18ydl/ZnMeb2ZVixJzwt/75uqAbhyznjYvxnugcrB3X+LMQxnjangmVfSyxNHDOQzl09n+f1rOqw3dEAJF0yt5ItXncPMzz3cuv6rB08AcN38iTzywj6unDOOr/3xpQ4/P3V0x/nCPunuFoSzxg0hVVTEszsOA+lt+5YzR7HoK4+x4PQRHDrewOa9R/n7S6dx4PhJZowdwoqfrs+5/UkjB3JG1WAeeWEfAC/cuZizPvtrJo0cyL++ezZXff1xblxYTUlxES+/fpzfbNzb+zfbT8yeMKx1e7X41YfP5+1f/VOb54qKjPW3v42K8hK2v36cCcMHsONQLSMGlVLf2Mz+oycZMiDFwNIUbzpjJC//y+U88+ohJo4YGOfb6VPeJvSkGT98ICdPG8Kk+oE8dstbAPj+++dTXTmow1fpW2T7A9Xik4vP4qntB/nR0zuofeEFaIbBCTtl8aGLz+CDsy7g4z9+jvHDB3DfX6W/TLPj4Am+9aftLL/gdIqKrMux6i6hnzN+KOt21vCLv1vIGyYMY8fBEzzywj7GDRtAeUlxm9iZy8+8eqhDQh82oIT+euHykumj+PWmg/zx4xdT3cmXlLpjQEV5+otpLV9yyvyyU1VFWYefOXfi8N511lPJ+u2OVPSzK6ZUDYbdpz5HLpyS/+meeZNHMG/yCHjpINyfd7iCsWjmGNgC08cOgdOGsOqWC9q8PmHEQG6/YmaP4xqOoQNKuGTcaHgJPnTxFJZfMotm51rP37ckjo8vmtZlrDeMH8Y7Z5/Gnpo6rjpvPMN+W0LlmCHQVNPjfuWsgGdj3XvdeTRailSON3r/4QcWUFVRRuk3iqAp4s4lhBK6eOnaeRNhSzixbnv7DM6fWsnrx04yfc1DpF425p8+Al6Cj7x1SttaCNDhqDyb4iLjK9fMOfXEmoFQWgy14fTbR7kmc4A3nZEu/kaRKaHnyP+EHsf0u0i/XNS+/9HeWiHWdiJpot2ExRC2/18H1QSnja6ATaW0ueAa6v6VOTYhxY1k/3eEM75tp5j2eodo6UMBf/ooFN7NchGRzhTutMWKMv+PG33heUKPYyeO6RfFopvqF3s7ERbnOtVGtOEjayCqsSngaYsV5fHMFBPvE7qIiLRQQhcRSQgl9Fzpgkw/4ft29r3/ko8EJPQ4ZoXEV5wrnqJZcbUTZeGolm0SQRttzh+HGD+SsYlqjEMY35b3a/nOmHHt/pdsEpDQRaSQZ7lIfPxO6HHNColFDDNDYmsnrpk0Psbvf8W59McmPjkldDNbbGabzWyrma3o5PVbzWyjma0zs9+bWfYKWSIiEoluE7qZFQP3AEuAGcC1Zjaj3WprgbnOuXOAnwB3hd1RERHpWi5H6POArc65bc65euABYFnmCs65PzjnTgQPnwDGh9vNQqALMv2C77OZfO+/5CWXhD4O2JHxeGfwXDY3AQ919oKZLTez1Wa2ev/+/bn3UkREuhXqRVEzux6YC9zd2evOufucc3Odc3OrqqrCaVTFuQq3ncgKR0GYxbk68rE4V5j7aLjFuTLutN7LMPrUkatcqubsAiZkPB4fPNeGmV0KfBq4yDl3MpzudUe1XAqynUTMPlItlxCDhRhLupLLEfrTwFQzm2xmpcA1wMrMFcxsDvAN4Arn3L7wuykiIt3pNqE75xqBm4GHgU3Ag865DWZ2h5ldEax2NzAY+LGZPWtmK7OEExGRiORUqNg5twpY1e652zKWLw25XyIi0kN+f1M0Trow00/4vp1977/kIwEJPa5ZIVE1E1fRrG7ajaXRMELGNVsngvgu64OwgoYUMqQiYqHt21HPnEqOBCT0OOgqfeGJcJvEVr8nTGH3OcR4Xo6nn/xO6ImYHtfaECrO1ZMmfI2v4lwSHb8TuoiItFJCFxFJCCX0nOliTL/g+0U33/svefE/ocd1y7bIJKiWSywzdmKaFRRJ/ChruYQprH6GtM85zXLJlf8JPQ66Sl+AotwmPm5vzXIR7xO6inMVZDtRFaDq0EakDUQUVsW5JDqeJ3QREWmhhC4ikhBK6CIiCaGEnitdXe8nfN/Ovvdf8pGAhJ604lxxteNp4awOt1zzqThXDLfkCyVkxi3owijO1XKBNYziXPqD1SW/E3pstVySUmMlrnY8r+US9Uwg1XKRiPid0EVEpJUSuohIQiihi4gkhBK6iEhC+J/QYyvO1VfFuqKK62vhrIjbiHQ2UAzFuUKJq+JcvvI/ocdBxYUKkIpzRUvFuXzkeUJXca6et4OKc+XWQERhVZxLouN5QhcRkRZK6CIiCaGELiKSEAlI6DHNColrNk1s7cTQRhy3uQu9jQhnUbTpe1jtRDAeYfUztH1btVxylYCELv2TLrS1VcDjoVkusfE7oas4V4G2o+JcXQRXca4+i5F8fid0ERFppYQuIpIQOSV0M1tsZpvNbKuZrejk9TIz+1Hw+pNmVh12R0VEpGvdJnQzKwbuAZYAM4BrzWxGu9VuAg4556YAXwa+GHZHRUSka6kc1pkHbHXObQMwsweAZcDGjHWWAbcHyz8B/t3MzLkYKuk88z148eHw4zbUtn28bxPcMz/8dpob2z4+uieadk4ea/t40y9h1+pw22i/uQ9uD/+91B4KFoKLZL+/Ex7/anjxj+yBssGnHn/rMijK5dckBwe3QeXU9PKuZ8IZm9b9NBiPn/8tlA7KL2btwVPxHr0Lnvpm7+I01QddC2J9861QVNzzOK4ZrCi9f319YTJmzVz0SZh1Vehhc9lTx4wrKNEAAAckSURBVAE7Mh7vBNrvia3rOOcazawGGAm8nrmSmS0HlgNMnDixl13OcOHH4bX1+cfJZtKbYfw8wKCkPLp2xs6GKZfAqOlQf4zI5tqWvzXdxsKPwCt/jqaN0TPhrKUwbk76FzEKFWPhtDkwbzkc2xtu7KozYeKbYcqlMOtqaG4IN/ac98LJozBwRHhxJy2EuTfCidfTsfNVdRbM/xsYOh6O7Mov1ri5MPtaWPv93o/lqBkwbTFsebjjAZCvyodFEta6O4g2s6uBxc659weP3wvMd87dnLHO88E6O4PHLwXrvN5ZTIC5c+e61atDPkIUEUk4M1vjnJvb2Wu5XBTdBUzIeDw+eK7TdcwsBQwFDvS8qyIi0lu5JPSngalmNtnMSoFrgJXt1lkJ3BAsXw08Esv5cxERadXtOfTgnPjNwMNAMfBt59wGM7sDWO2cWwl8C7jfzLYCB0knfRERiVFOl++dc6uAVe2euy1juQ74i3C7JiIiPaFvioqIJIQSuohIQiihi4gkhBK6iEhCdPvFosgaNtsPvNLLH6+k3bdQC4T61TOF2K9C7BOoXz2V5H5Ncs5VdfZCnyX0fJjZ6mzflOpL6lfPFGK/CrFPoH71VH/tl065iIgkhBK6iEhC+JrQ7+vrDmShfvVMIfarEPsE6ldP9ct+eXkOXUREOvL1CF1ERNpRQhcRSQjvEnp3N6wOua0JZvYHM9toZhvM7Jbg+dvNbJeZPRv8W5rxM58K+rbZzBZF1W8ze9nM1gftrw6eG2FmvzWzLcH/w4Pnzcz+LWh7nZmdmxHnhmD9LWZ2Q7b2cuzTmRlj8qyZHTGzj/bFeJnZt81sX3DzlZbnQhsfMzsvGP+twc/mdF+0LP2628xeCNr+mZkNC56vNrPajHG7t7v2s73HXvQptG1m6dLbTwbP/8jSZbh7O1Y/yujTy2b2bJxjFfxctrzQ5/sXzjlv/pEu3/sScDpQCjwHzIiwvbHAucFyBfAi6Rtl3w58vJP1ZwR9KgMmB30tjqLfwMtAZbvn7gJWBMsrgC8Gy0uBh0jfKHIB8GTw/AhgW/D/8GB5eIjb6jVgUl+MF3AhcC7wfBTjAzwVrGvBzy7Jo19vA1LB8hcz+lWduV67OJ22n+099qJPoW0z4EHgmmD5XuBDvR2rdq9/CbgtzrEK1s2WF/p8//LtCL31htXOuXqg5YbVkXDO7XHOPRMsHwU2kb5/ajbLgAeccyedc9uBrUGf4+r3MuC7wfJ3gXdmPP89l/YEMMzMxgKLgN865w465w4BvwUWh9SXS4CXnHNdfRs4svFyzj1GujZ/+/byHp/gtSHOuSdc+rfvexmxetwv59xvnHMtN8t8gvRdwbLqpv1s77FHfepCj7ZZcGT5VtI3j8+5T931K4j7buCHXcUIe6yCfmXLC32+f/mW0Du7YXVXCTY0ZlYNzAGeDJ66Ofj49O2Mj2rZ+hdFvx3wGzNbY+mbbwOMds7tCZZfA0b3Qb9aXEPbX7a+Hi8Ib3zGBcth9w/gr0kfkbWYbGZrzexRM7sgo7/Z2s/2HnsjjG02Ejic8QcrrLG6ANjrnNuS8VzsY9UuL/T5/uVbQu8TZjYY+G/go865I8DXgTOA2cAe0h/94na+c+5cYAnwd2Z2YeaLwV/2PpmTGpwjvQL4cfBUIYxXG305PtmY2aeBRuD7wVN7gInOuTnArcAPzGxIrvHyfI8Ft83auZa2Bwyxj1UneSGveGHwLaHncsPqUJlZCemN9n3n3E8BnHN7nXNNzrlm4JukP2521b/Q++2c2xX8vw/4WdCHvcHHtZaPmvvi7ldgCfCMc25v0Mc+H69AWOOzi7anRfLun5m9D3g7cF2QDAhOaxwIlteQPkc9rZv2s73HHglxmx0gfYoh1e75XgtivQv4UUZ/Yx2rzvJCF/Hi279yOdFeKP9I3zJvG+mLMS0XXmZG2J6RPn/1lXbPj81Y/nvS5xQBZtL2gtE20heLQu03MAioyFh+nPS577tpe1HmrmD5ctpelHnKnboos530BZnhwfKIEMbtAeDGvh4v2l0oC3N86HjRamke/VoMbASq2q1XBRQHy6eT/qXusv1s77EXfQptm5H+pJZ5UfRveztWGeP1aB+OVba80Of7VySJMMp/pK8Yv0j6L/CnI27rfNIfm9YBzwb/lgL3A+uD51e22/k/HfRtMxlXpsPsd7DDPhf829ASj/T5yt8DW4DfZewcBtwTtL0emJsR669JX9jaSkYSzqNvg0gflQ3NeC728SL9cXwP0ED6HORNYY4PMBd4PviZfyf41nUv+7WV9LnUln3s3mDdq4Lt+yzwDPCO7trP9h570afQtlmwvz4VvM8fA2W9Havg+e8AH2y3bixj1U1e6PP9S1/9FxFJCN/OoYuISBZK6CIiCaGELiKSEEroIiIJoYQuIpIQSugiIgmhhC4ikhD/H/3zgjeDBrhTAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGwAAAD8CAYAAACSAEGOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOy9WZClyXXf9zuZ3373W2t3V2+zz2AAcAbLAKABwhRok7RM2rRIUVIoQgxGwGGFbDP0YCn84vCbFH4QSdmizAjJFBkKkpZs2aICMiIIkRKJZQgQg+Gs6O7pvbu61lt3/7bM9EPerp4BpwcDAiCrwToRFXXr1r15vy/Pzcyz/M//iHOOY3lwRP1ZX8CxfGtyrLAHTI4V9oDJscIeMDlW2AMmxwp7wOS7ojAR+WER+bqIXBKRv/vd+Iw/ryLfaT9MRDRwAfgh4CbwZeCvOOde/Y5+0J9T+W6ssA8Dl5xzl51zJfAbwI9/Fz7nz6UE34UxTwE33vT3TeC5d3pDFGQujbrgnP8RARGcgDhwItQNhRioMwinoGqHiQSxoCqLjRRSOVRlDsd1WoFzuFDBmzYSsQ4xDifgQv++OlOIvfs+UJV/HQ6cFhDQuYG6hiDw4ypBHGCtv14l/j0ii4EcTiuqpkKXYAMIcgfWYSOFE0A4/Fw9LUGEeTWkNHN5u7n6bijsXYmIfBr4NECcdnn6R/82VkOdKlTtsNrffNUQTCzMvn9CNYtQBwHxvsJpP06dOnQJYoVs09G4Y6gTRd4TVA11KtgIbAi68IoAiIZeg/NVoeg6yhVDcifABg4xgo0dUgGLaYv3hdYNQ9Xw1ycGTCSEM4uu7inWLV5/Vwl779Hkq4ZwbU65l5DcCQgnULWgalls7EjuaHQFzRsWp+HVf/Pz952374bCbgGn3/T3xuK5t4hz7peBXwZoq75r/d5lqErMcESwtoqbzUEJrqxAa+w/HAMgH3waefUyaI0rS1xVgzWoVgvV7UBd46yF0mvGDAbopT5mbx8AvbaKPRii0gRnLHY8fuuFifhVfleUBmvQjz0MwzFuPAERJMtweY5of6rYeY5oDVqDtUgUAZD9q4H//8efIfjqK/DwaeT2DmZ3D5Vl2Nns8KP02ioYgx7n953c74bREeCNjr+AV9SXgb/qnHvlfu9pS989J3/hO3odD7I87z7HyO3/6WyJzrlaRP4W8FlAA//0nZQFUK802PnJjyIGqpZgIqgzRzATbAj52ZK///F/wR9OzxOK4V+98T6qMmC1N2Z/knGqN2RWhey8sEa8L+Bgvu73JBeAjSxSKVxmoBKCiSaYCnXqqDuGqJ/znhObXBksURm/157qDNkct9DiCLRlb69J9mpC1XKoUjCpQ1UQjoRw4hDrzz7cve1QHOx9pOKvfeB5ElVxYbrK5y89jC00Safg7NI+ShyXd5aoa03yYubP6V/50n3n6rtyhjnnPgN85l2/XkPRE2zgH9cN688OBXXDEWYVGkdtFcvxmJXWlFjXFCbgqbU73Jp0aEYlVccgtUYZwWQW3S1xuzGqVRFEBlNrzFxjT1ZUd2JsCAisdcc82trh9qRDIy5pxzmzKmKjM0SJYy0Z8wfVGfJejNP+7CFwSOG/UCb2xg/CoXEjFpyCIK35kfaLXK1WmJmIh07ucnVrCa0t3XiOwqG1Zbk9ZaedgvPGyf3kz8zoeLMEc8faV0qqpsYpqDJ1aFE5BcOHG/zj1R/g1qBDElUM7rRRU42sFlwfrAGwN1F0bwrxwBLOLdNVjQsydOGwOiFfEpIp4Py4wQzqFEBzQy2zudehHkUEQ83tzPpV1DJIqXglMURbIWtfteR9hc4FG/ovWDx0hFODGIc4kNriAn+uOYHNLOUfnv0UtycdblxfJtoKCOeCyuEP3pvgrBDshtxOmqy8DuIcm/N3mKvvujbehVRN4eYnQ1wAda8G45C0hkkIzYqnz93mH53/l7xa9ljSUz577r1sVy0eTbfYrVqEYpjZiP/3ynsZXm8jTkhOjxCB3CjSuKShHM24IK8DhtOUfOqNgtWVEX/t5Nf5aPMivzt6EuBwvHPJLptllzPxHp8fPMKXmo/jGhVoB0aQwMEwJJjqhfuxuKHFShMDj378Kv/rmd8iFMVnz6/zynyD37nzGO0457/Z+B1uVX12qxZDk/Iv1HMgUP27+8/Vd9zo+JPIsdHxVnkno+M4+PuAybHCHjA5VtgDJscKe8DkSFiJEgYEy2sQhrj5HIy99784wjUzpDaYpRZ6fwJlhSsrzO4ueqkPZYUZjQjOnsZNplDVsLoESuE2t5GNdWSWU5/oof7oEqrfw7UbsDOAqgStqR/30bTg5StIlvprGY+RZhNXVdQPnSC4vIlkKW4yw02nyKl13O0tVKvpQ2V1jSjlQ2PWIXFEffMW5pPPEl+8g13ponaHuPkc6bRxSYTkJVS1v56tXRCF7N9fLcdW4hGUP9XQ1J9EXCvDfPBZVGUpuyEm9ukQVTuchryr2f2gJbuhqdruMPwj1oehwrGPKnTfMETDGhMrqpamSoWgcFSp4AIIJw4bCOIc4cw/rmOh6ArzdUey7SMXKLAagjmoEqYbjtZV6L5R+rEbClWDDYV43weZlVlcl3GHB40TYfBYzPBxfx/hUJHdcegclIHROSE+gKLnP6f/uk8Nud/+4n3n6niFHUE59sO+h+RIbIk0Uuyz3wdaqLIAVXmjI5jVlJ2IsqOZLyuwYBIf8onGDhP5raVO/Va29vwMG2nEOqYnI/+6kaHOFOluyXgjpk6EeOwIZpaqoTCRUPSEqgHpjiOY+x3HxIKJ/eeUTSGcQftKjlhHvhyRXZ8y32gQDUofQzQWG2r0vMKGGllkm/OViN2nA4olS/sNRTR0tG4WzNYixmcUyY6jToU6g+WX/FjuC/ffEo+EwqS2RLcPcHGIdDOCgxnUBgZD9KlVoMngcUWyA9mWJZpan4rJFOHUkvc0yjhUaQj3pthGTLKvCUclxVJMdqcg2B6RpkukN0ZU/QxV1GRFTbHWwCQhYoV0339RTCRkNyvyfkA0NFgdEE4t4Z0h9XKLeL9CTeaEk5jo9oGHAgQarRQoQReLtLZz2HCJ7I6mdwE6//4S+fvP4JTf7YIpdK4WHDwcE40huTUBeAvM4RvlSCjMBQrTb+IChaot+ak2qnaw0cVqj+eI933qZfioIt0SkoFjdFYRjYRgDlUkuEBRdzPEOb8SVmKiYc1kIyFuhdhQmJ1rYwNB55pwonHK57NMLARTw8HDEbpwmChEjGP3fSHRyBHOBdtt+GtNNfZ0j2hvhllqQW1xocLGAVL71JAqar/ClkOqhlD0hc4rPQ4eCsl27cJwcuw9leAUmAjQ/h7uKvTt5EgoTCqD3hv7VWUtcdFG5iViLMzmJP0OVvdp/z8voPs9Rh89S+PGjMZmQHR1F5cXuPEEOX0Smfn0uh4kmE5KcHuf8KCDTOYwnECvjW2n/vOqGr3cIX0jx/Qa6HFOcktTrjYIZhV6lJNttYgGuU+fTOaovQPC1SVsFiOzAqkM1AYpK1DK+4Ba4QKN1Ib2pGC2vEQ0hvlGi96FkvjmAezsM/yhx+m8sMX8oT4uENR47t/3Jj/0j83VkbASW6fch575m9hIY0Oh7HhUUjixFF1N1RD2Pmh8clAgPNAEY49YqjOHzgVdQOeyId0tsVpR9APyjiKaOupYqBoe1GMjIdm3hFNHnQizNUWdQdG3RANF3XAesFOCnnkAT505uq9D93JB2QmwgXcndOlXsqocwbTGKcFp7zZIZXFasfn9KUXfYVZL1G5IMBPal31K6eA9NVIq9FyID4SlV2qkdrzwe7/IeHTzbZfZ0VDYsVn/Fjk267+H5GicYWmCfugxAKp+RnCQMz/TIpzW1InGJIrbn1A0rypU5XGIuvQgmDrxZrnOHb0LJcGkBCWIcdTNkPlySDizlC1NNPK4wiB3hJMaVRimpxLioeHOcxHZbZ/mL7oeONq7UDNd1x5WMHf0XvKWbN0IFyBSv/UFoxypDE57AI9UNS4MkKrGZjHXfqxDvlbTvBIQjRyt6zUmUcxWFMnAMjmpCSeOpVemOBH42ufvO1dHQmE2VOQnW9RNf8P5akwws9SJps40875CVWBiGD/kCEeKaHgP+JLdsSgD4zMRzdsezFk1ve+mc2+mly3v48VDS5UpxGrK1QhdOnafjnDKj1W2vU8kFvaeCqjajuy2kBxYZmfbHhVcOMqOV2Q4s1TtEPChNBv4L8tdfONsLSRfq/1eJiwwKiE4mK17X82GUHaF2YkUXd7DhLydHJ9hR1COfPBXkhj16BNIVcPuvkfVthuwO4B+B3YPmH7kPLqwRP/ua/DMk+ibO1QPrSOf/xqz//I5Wq/sIkWJyxJvwtc1rtfGXriMPrGOi0MYDJF2i/z8MvHmCGqDu72FnD2FefUC+qnHsG9cQ2UZZjhCve9xZFbA/gHVE6eJbu5jmxkuDVGzcoG5Fw8aqmowC4dXBAK/W5jXLjL9S8/RfmmP6WN9ooOK6NImBAFuNsPs7aN7PeYffpj0+tC7CNd///5zdbzCjp4cW4nfQ3IktkSaKe5970dVBhtqv9WwKAWqHbMTMVUqxCPLwSOacOxIBt7BDXJH0VJkO4Z0c0rdilGld7LLbkQ8KJicyahjoXtxSr6cABDMDXWqfUhqZpmvBHQuzSi7EePT3pqLJpa8qwlnljoRGpsV4aigbkWL6hpQxiKlBeWNDKm9wyzGIg4mZzJ2nlUku4KqQOeOdM9SNv1aUbUHy+rS0b44xokgr93fSjwSK0yMQ88rH0NT96wsX5/laL9+wMHjsPUhhSo9rDvdrahjIZxY+q9MsJEwOddEFQaxjmI5pmxr6kZI41bO0vPb7D3dQNXOuwupJtme035xm2BuiCaWze9vMFsJ6F0oaNwpybsaE7FQDISD3H+hRNCVRReG4CAHJahZhSqN/6mtV1hR0f7c6/Rec8zXHFUTwpk/gtL9mvmqEM4t0cQyPaGpOzE2C49+LNFpwWQhLlRUzQCnhDoRkr0amyr2n0qpNwrCazHT05bWVcX+EzFlG2wUwLkmRU9o3fAmthjHvK/JdmvmKyFVFlE/kx1mnG0Q+JRMHDD60BrjMwoTQ+uaY3pKqJoJ+ZLPOAdzGDyuadxylCspZVujKkfZDImmliDVBHODzUKwDhbpHRcGKKXY+umnGD0MpmFQpcaE3sWoM03Rc+y+J0CXkG47qkaALuxhTdrbyZFRWLHkc1XioI5l4TtFhDNHviw0WjmTU4qoUTLSGekdxex8hbwRYlKwoaPoKMQG6NKBwHgjQBfetxIDZUuYbAQku97pnp5IqVPv35U9y34bwrEw3fDVmShhdsqi54LUQjTVWC0ULUVQOPKuRqeKeOQhDSZWi4pKv4rEQL4iBGfHnO6NuJqscBD4eGLdcFRdgxhBTxVlB6Kpos6Uj1XeR46EwlRe03xlGxdobDtFHUwpN3qEuzNcqIlGGVthl6wEVUV0dn3Ja+u6pmo4mrdq5ssB7Ssznw/LIkwzQhZOqKoMalIyebRD+pXcT2rtz52iH1M1NeMNhS4g3bMEucUpQeeW0dkAVcHSyxP03gTbyZDaYhoRelIgeXVozoux9woCawNhAKwy2WqyudGkfwuyHYsJ/ZY+ORn5xOmeJdmvCYcFGOdLc+8jR0JhVStk9z86gVN+NaiqRZ0KYhLEwfCJmr/y0d/nq4PTPNbe5v+7+BR1qYmziqrSjJSlmY24+nqfxs2MYOaYnParyobOR9zbCWouqDJDF0I09NUrJnNUpwuWl8bMioiDQYqKDatLI3bGGXWt0dqy91xM56UmdQqqXlx3MyPZg3DsDQfwNWG4BSgHGDwp/NiPfImtokVpA55/6RGCoaLuK7qrAyqjGWw3IIbOV9uIhfraMcztgZIjH+koTzS48bMfw4VgEkedWVzokFJB4Fg+t89vPv1/cMM0GduEX9/+CEosZ9IB+1UDhWM9HvJrr30Yd7nht7zzU6o8IGmUiDji0C+Lsg6YDhNUYLGlJswqPnjmOp/oXeB6scRm0WE1HlNbhUGxWzR5tn2df37lgwxfX8J0al9uVCnQDj3SRAcKqX1GXAyH5UZOwfInNvnFx36D90Yhn5vH/KNbP8ilvWVOdYZ8aOkat/IuANcnPa69eBKphfLn71+BeWRXmF5bxWxtv+3r1dNPIDc34cQq5vVLby0ifwdRWYYrSxCFq8pv+vrg7Gns7j52OvVPfGPB+rsU/fgjmK9fIjh3BnPrDipNMKMRenkJs7tHcO4M9fVbYP3Z9U4r7JsqTET+KfAXgW3n3NOL5/rAbwLngKvATznnBiIiwC8APwrMgL/hnPvqN7uhTrLuPnr+Z5Dp3McQywrbbaDmFTYK0IMxN35ig2zbGwPB3BLOFoZBYZkvBUQTS+PCvo8nRqH/SQKkMlS99O7NgoVgXGCTAD0uqJYzqkZA2daEU0vRUcRDiw083USQW0ykiA9q4tsTHzNUgkzmmNUOalJ4o0MEqQ0u0N6BtguOjpMd6jRg/4mQcOzI9gyNSyOK9QbiINqbMz/VAAvZ5QFS1Xzh5q8xzO/8iUNTvwL88Dc893eBzznnHgU+t/gb4EeARxc/nwZ+6V2Mf4+IZBEwJQxQwwVyKlAU55c9r0VDqFMIJ/aQxMQGgqodVbYoU01jP2FF6QEzzZhgWnmr0DhMrKhbMXp3TN1OMKEivTXxhkJpad4sifdLTCREY4OJPDkL1iHW+hxXI6Y+1Ucqg23G3hp0DttI/WOtveKUIvr6bSYnAlo3DfHIonPH/HQLGyucFjY/3iWYGsJxheQL5b/DGnpXW6KInAP+zZtW2NeBTzrnNkXkBPC7zrnHReR/Xzz+9W983TuN/+YtUYIAV9ff9JreLLrdxkymh1vK297D3XHftK1JGPmtccHF8WZRjca9rfDN48Qxrij++LiHb/zjY32r8t0I/q69SQl3gLXF47ejLTr1dgOIyKdF5Csi8pWKexPwrSoLwIxG33SSDsd90xf08Bx7m/e+nbKAtyjrLeMevvHbU9Y3k2/bSnTOORH5lk/itzDhtE4586FnAXyBQlNjAyE6qCk7AVWm2P5kRbQZHiKkdO4Dv3UimMT7Rr0LFTq3mFhRdDRVUwinDl36ADFAcuDPqXBqicYelbX/lEdHRQPBaQhmPvpRtRzJnkcVx3uO3qWcohcuqBmEcGJwSlDGoQuD1O6QX0qsj5ZMziTc+YQl7s+RV1u0rrjFfcLspNC46ZidEEziOPGFGl0Y+PJ3Hvm7JSIn3rQl3jXn3hVt0R+TaU700lVQGjedEhmDOncad3OTdGUJe2ebpc9kTD/2CPFugX75MrKxjnntIrrX8/REa6uHdEWuKEi1xo7HBKc3qG/cpHN6A7TC3tkmKwpvmV25RgR0ux0IAqr3nCH8o6tIswFVhVvqIqMprpliXrtIsL5GMJ4gaQKioNeG/QNcUSJB4CP2d0UEyorexYTeZwrMwRD1vidQgwluNsOdXsf+E89IuLK4B9VqIVoh8+KPz9HdYf+EZ9j/Auw55/7egsCy75z7H0TkPwP+Ft5KfA74Refch7/Z+MeO81vl23KcReTXgU8CyyJyE/ifgL8H/J8i8rPANeCnFi//DF5Zl/Bm/c+8mwuUJEaffZh6peW3mMpQ9GKya0NMM2a2kbH9rCc0aV13JAPDvK9pblYMz4ck+57wJNu2pFs5qjSU/YT49oTpw22iYe2zAIu7jYY1OKgzn2LZf7pF2fYFEb0L/gyarmmCuaNxp2Z0LqB7sSTenTM73STeL31GIdMEM0O4O8GFekHpJ8iCjg/ApiHXf7hNNPJw7JWvFYiD+UqILhyTU5p01+IUtK7lPh754v3zYUfCce7Ea+5j638Vl8beHG9m3qRXgtMaKUrMxcvUP/gBxmcieq9OCG7sYJe6uFCTn8iId3NcoAje2EREcP0OTvlzq+4mBJOSqpcQjAr0YIrNEu+nlTVyewd7dp35iQbh1Csz3J1hX34d99H3E+xPqVZbRFe2PS9iHCLGYvpN9PbQl76Wlb9mgDjy23OgcVFIvdKi7IaMTwW0r1XE23P0zgHl+VVstLjGTNN8eQsXBnzx6q/c1w87GgpLTriPnvnrHpsOsKC0c3GIFBXFiTYHj0beGIi8gaGMI5w6ZiuKpZdn5CsxyW7pI+i1xaYh+tYu1fk1bKQpuwGqdCQ7OXpzH7vcQYqKut9gejJmclLT3LTEBzWz1WABV/PGh6ph+cv7h5lw2dzDnluH2oIW9N7Yk2kuiDlxzq8UY7CtBnd+oE+ybxeJUEd2O2d0PqVq+EKO5q2S5Oqex9WXlXecy+2jG0vE1LC9h7RbYAyuNlAUSBQijYzw89dZ2zxD3W8gn/8a6v1PwuWb1N/3CK3fu460W8gXrhI8dA57ZxunNbrdgjRBf+0iQatJOBzhjIWnH/Wm/cVrcGIVPS3pfnmf1nIbdTD1KKqPPEm0PUUNRuSPnyD+ykXYWEeNcx+FmU7hhdeQIEAaGYQhFB7AetcPc9Z5ZNSdbU6UFebCGwQbp3yB+8Mn6L4+xiYhdSMg3pp4XP7mtv+yvsMiOhIrrNXdcM9+7L/DxL6YwIQ+kRmNDLOVgLwvjJ6uQBzJ9QhVQbLnqDNP0FUs+Xto3BA61ypMpJgvK0wsi9Ih/ztfEpq3fD4KgWhsyXuK6SkhHtwlC+OQ7dSkjnAkHoyaQveSoWyqQ9IyJ3eJxuxb6pp9vs0nUQ8eDhk+XSGpQW1HNG4oXyTYEWbrDl2A1EIwg95F79O9+Du/wPjg7YshjsQKU4Uhu7CDS2Ns5ON/aMFFAemVOXsfXkFeCokHDl1asu2S+XKI7PmIuLoMJhRaVyaovELyivBc77CipOyExIOSfDkiXVCl3cWLqDNNnFIkB17pzRs5B4/6THT/9YI61ZQtTfeFHR+fTEPqZkh8bd+fXdP5vTPXeiyHU+J/hwF5b4XoCwHztZDWdUv3s68x+sHHSAag54rexZKiG5BtemMJ41DFO0RsjsoKe/8P/vfYQKgyhS79qgjmlrLlK/Z3PmzBCC62xNsaVQl6DnXDswiku76ENh5agtzHGOfLinjon68yfybVDV/Al+4b8o72K7QvTDcs4UgIZn51W+3PLhP7JGgwE9afL6ja3qm32q/uaGoXlqHnAb67Ou+mWepE2PqIYFdKwmsxwUxIdzxy6s5zCgGiA+/8dy/4Qr9X/u3PM9m/cXSNjmM/7K1yDCT9HpJjhT1gcqywB0yOhJUoSUxw4gy2nSGVwbQS6lZEtDWFQFF1E/aeTjCxP8yXXvX8wLNlRTTxPBpBDu3Lc6Jru9h+i7qboOY1xXKCqh1FNyDdLj09UjsknNUEgzmzM23yvqboCNHYw7/jA2/sxCNv9gdzR7Zdk9wcUXdTXKjQo5KqnxDtznxytKrvJWKtLyqXqqZe77LzTIOiK+gSOlcMTrwDbSLlsw0xJANL6/WhLxK8eP/qlSOhMBtpZo+v4pRgUoWeW6qWxkQtbCSMTwWMzzps5GjcUGw/EyHWpz/yud8kghziYUTdWEeMo+gFVFniowu17zChqpA6udu6AUZnepjId4cwieeOsiEUHV8RefCw9qmbUhAbUDV63o2oHdWpmGDuqJotgukCl7goMBTrO1ughK0PRhRLFicOVYsH7DjIl72VWTUd0UgoWxoxHf/eG/q+c3UkFCbGV+ED6NJDneNBQdUMQSl6l0rqZkTR9wRfvQslk1MRbAHO+YpJA/FBhZTeF6paAUHuaN6Yg4j3wbZyZidTwlHtQ1szRfN2jYlj6hpWX8iZrfkvQzg1uJse5z/ZCGhfnnmfrhcRjmriPYdJPEybu0jfRT8WgKCymFjTuh6QL4POhaWX3WJsSzQW8p7nGWlsGpyCdLvA3a3gvN9cHZv1R0+OzfrvITlW2AMmxwp7wORIGB2232DyQx/xTXJiH8tzylPgOQWzk46zH77JnVGL6TBFBiFS+/gc4wDXrCHXdF8K0IU/2Oer3goT5+N7TnnWT/C0RKry0fb5CYcqQb9nxGynQTDSmPUCN9ek10Pmp2vECMkdTXbHLcqK7h0vyb6vtIR7TXLuNq1zCu583LF6fg/rhMGwgbudoOeeIbU+UYI4nFGE2yHtS4v7/pd/ys1yvlXRc0Pn5X1wjmqpgZ5WVP2EcFRStSKS/ZBrboN4IHDGEB0o0m2HfjWiagk21JgYOlcqoqHvchfOEpJB7fu5yF0CFkW6a2nczDFZQJ0qWreE6aomrzuEiaNxW+BawuS0I9l3ZHc0dUOIDxy9V0bYJPTX2QoxiSLZKQgGM598vdtdsKz84zDAyRL7gxWqtiXZ0rRu+FLcybpmXsbEB34OoqGjc6VArEPnR50c7NhKfIscW4nfQ3IktkSyBPXYk7hA+eQlC86pqd9a5icypquabM/nsJq3SkzqowHh2BeY21jIbs4wjZDgIMfGAS7U6GmJVIbZ2TbZxT1cGpGfaJLeHPsqynHO7FwXVXlnu3FzRr6cEB2UTM6kpDsVNlKHQFFVGZwINtFI7VkD9LQ4RE1RW+9Ah/76yqWU8UZE5405042EzmtDXKiZnmmQ7JWE2xOc1sxPt4hGvrCdl48415QLFdPzLawWyqbnXvJVjjFOQZ15bsE9gKBmbxySbitmG4ZwGGNih9RCPGjTumFRqxHzJbUINWXEA8d8VZCn1glmgMB8pecTpo0WRRfmp2vUXNDzJuI8V5UuBKkjzzBaBfRfWfD/zn3oqegKycCi6pRg7kE2uHtJTKeF2bJi/1nD+C8atB4z/ErPJz7HcPOHQvR0CbGCiR3LLwSIBXPpqIemakeyXaAqi8kC9KgkP5mhc4sq/Te/bIW+wLwDa1+pma4KjVteuXnf4ySWXp4tqCNAmehwfBMK7a+U5EshVSYkB+YQal22FI0tmIwD0h1L1fRkYo07fqXkXbWYYEeyV9G8UVO1wgVUQRFvzxfEX/faAt9tF2wjjZOY9GaAXGvhBBq3HeHUMlvRnPkMzJc834cNhM4lD1/Qxf2NjiOhsDpT7L4vwwWeSc2EKapmcSMwe6zgP3/vl7k0XuFUNuR3H30UgCiuiAJDI6wxVgxVPWcAACAASURBVHHl/BLZlmdzy1cdUkPZN+ipwjQWWxaOaKCR2gea66YlWJnzxIlt3thdoioDOq05BDXTMmQyi+m25mzvN2l+NaVqxoeryGlHOI4IJ/faAnuOEX/tTuDgvZaf/NgX2Mzb7OZNXr+xjpsFqMac1o8fsBwV7M8zxpOUst3yO8rrDwib22H5DxxWJ775uW8qf8IKyfuJ7nYwB8N3/sgwAmffWsWySLPIoh/LtypHvsbZdjPyj3sIftVUlA259y1VMD0ltD68w96giZ0H6ANPxWDPzXHbCbZdow4Cuq97KnMTCrNVBcqDaMBH83Xut9R0xyu1aghVC89nGPp6ZRs6XOgO65QRkEKRbila1y1V5qtnTOyvLxo7ook5XGE2FFTpx3cB3P64Ijo74UR3xOUbK4SbEboQqqbDnchxtcLlmngroHvBI67sbz2ANc5/nuXYD/sekiOxJdLKqD7yAZ+JXQp9k4HFwW5Czztlf2KP/VtdiCwyDgimPlbnQocqBJ0L7auOdG9B79DU5D3xOPwaio74DHFLaGxagrnz2yYwWxfyEzXRnkaVHumbLzuCuW8d4gTifWHla4uODwrKlkbVHvWraufbjyy2ReDQWpycjNh/D9RrJZSK5qXwkEtq8nAFAun1kKrpWP8Di9XgPnvc3eiBkiNvdEgUoc895FkExLezsEmIPphBGDB5tMPuewP6rxnGpzQrL+bUqSYalpS9CBsKdaJItyviLd/Br15uEVzeZPbsWWzs8fq6cjQvDrFRgGmE6GmFaYTkqzF519M9+B4uDpMowlFNdFAweqTpx76+j2sknuKhrKlWGgSDOWpv5NkD7nZ0CLQH4ijF+H1rjE8H5Euesa3/WkE4mJMvaB+CSUXdDIkGBVJUvgPFpWMK2QdKjo2O7yF5NyWzp4FfxVM7OOCXnXO/8J1kw6lXGuz9+Ed9DE7u9e7SBVQNmJ2r+IkP/iG/feNxPrh+g9+78jDWKsKoJgprijKgGMck1yPigfe5qhbM1yzBxHMTusi3ow+mgkkd2aaiTqBuOOTslFNLQ7ZHTWa7Ga21CbNZTBgajBGaWcFwlNH5fELV8JA48LMRjSGYuUNO+rukKHfjinsfMPylj3wZgM9ef4LJMEXt+Phk+9EB03mMs0KcVKj/4Hmn6l/7NvywBUvACefcV0WkBfwh8F8AfwPYf1Nhes8593dE5EeB/5Z7hem/4Jx77p0+43hLfKt8W0bHgkBlc/F4LCKv4clSfhxfrA7wz4DfBf7O4vlfdf6b8CUR6d6liLjfZ0iaoB55AnEO04iwkUbnNSb1tcSqslz7kdZhIV/vQoGJlU97zA02UiSbM9CyIEkWbBRQtzydrBiPIQxHJeGVLWbv20CMI7kxpFppMjuZUHR85B18I7l0t/ZR/aWAbLtGlZZo2yORTepN82Bv6imLWon/nKLC3a1vAx+iur3N9n/1BNMNoXPR0rxVYiPPOpr3Nb2XR0zPNXFKaF72JM28/h1KryzoH54BnudbZ8O5r8Lqhmb/md4h9sJEHoWrar/6x+dh+b1b7A0bLPXGvHFzCZRDxKBCi90NcWlGdiUk3fUYjbKzQPMuttl8xZJuNSh/7DyqhGRXMM+s+jYaHUfy2JC9g5RgJ8I0LNTaB5+bNY2lGfVLHVpXI0ziQ1qqBhs0iEa+F4zcrcF7c0jLOg5+aolzH79GVkXcOL3EVhWgJ4pwosg3SrY/leJyQCzLz3c8ZuTadyC9IiJN4P8Cfs45NxK5t2L/JGw4IvJpPIEYUdYjKDyrTZX5cXXhz4WiK5jM8HBnD4DSaLACRlDNCmcFWSrACuJ8DxST+JhhnXluX1UI8a6i7PheY/Ee2MhnApyGul+ThDVF4jEgKCA2uFohoSXPQ1S8OKe0jyWq2iHGU5sHubvXb4W7xYCLtsOPzNkat3h8eZtbcRe76BdmIkeQ1dSFhsiiY0OdhN903t6VwkQkxCvrnzvn/u/F098WG843UhclexU6N5TdkDpRxIOaOtNkO0K2pXjh1lMo47/JSdPnlSanA7pfd0w2FO2rFnGWbLPARorWTUWdqEUvaJ9yiQc1o7MR2a5BjGO2EtDcrNgpItyXlugsoiuqBFUrBk8I4TQg3neHvL/T0xmq8gqykQ/0hiMfAREHLvCtPWzkG5fWv52AJDz/sSbZGxHhFDpX/Ba7Nc9YfsMyXVeEE0frVrXIan8bUO2F1ffP8AbGz73p+e8YG86x0fFW+XYjHd8P/HXgJRH52uK5/5HvIBuOhCHBmXOeVCsvsZ0mLtbo7SGEAWapxXw9RReW6XrI0hfuwMGI0Q88Quf5mwyf2yDZr1ClJby1j2ukvu7YGG8IaIXtZL7ge5rf43IKNE4rxu9dpfXKLgfPrNC4lYMWylaIOEd6c4LkFeWJNuHuDBSeiOX0Oqqs741V+RimS6LD/mEA259Yo3cpx2pF2Q1I7+S+PAkYP9QgyC2qcGSXB7g4AAty4ffuP1fHkY6jJ0c+lkgjRR5/DwQKJyy45i029L8nZzJ2nlEeBNMS4pGP5pdNIRla4kHN8FxE+5rPTEd7OWIM+cnWovlpzfRkTDT21LBiHfF+gY01JvaNUscbvt+KB9w48r6iectQdBVWC90LM0wWEB7kmKbHi5hYg3WHXWhttOjQZ51HgNWWshuz/1RM0Yfe6xZdOMKpoWpqomHN4LGI9tUap4V4UD4YqCkpK/T+CDcc4c6c8O12JzNUvwNA+6UZwbzPdD0g3bfE+zXx5ggZjLDrS+TrDfqv+UrI4Pa+Z9MpS8JmjCpq1M0dOgc9bBahpgWUFTKZ4TotouGY+s4W2WMPMz/fI/2DN2B1ieh8j+R3XyJ74iEAyn5K+Psvo0+dQF3dhOU+UeV7NTMYIkGAEsHlBdJs4GYzJAwJv7JFX32QwSO+P3S2OUdfuUO4vYN+5Dwr0yZqUmLasefJCnwrxvvO1fGWePTkyG+JEkfo8494niURbBKgZiU2izzociNl60OKpT9yzFcUva9XvmmNcT7BmQrZtiE6KNF57cnBIu3rm0tLOCo9sDSQwyaoyW55iJEv25qy6eFxzdv1Ye+T2bKm/+qMwZMZOFj9nVuYlQ5FPybenWOaEcFgjuSVZ9ROogWxtPZGSBRSrDUZnYswkW+8kwwN8aAi3Jmy+cllyg40Nh1VJqz//r5P3bx21Guck4DJk33vaCaCW1R+2ACCwoM3m08M2D8bkSQVt9e76FyoM3fIOLP3fkV6J6P/ukHVjskJ7duBONB5RL7kHV7wZbdlM6Fq+r9VCYNP5rj9mPGZgKprCcaKdFu48uOZD0orRzI84Rv5xMJsLaRqQDKIF51vF2WztfNlr4vfsxXN4FNzOq0Zu3st5CCkfTHFhinzNYeJHXVTMIklOej6wo3rx608Hig58ltivdpg+yc/ho2hbPmIgVrAJ5zA/PGC//oD/4GxSfji7nl2Jw0C7YN3zglp5F+8/UdrRAOPpJ2vW2xqkUoQI9hWDYWHvumpIhwryo5/TffEiIf7u1zcW6GqNad7vgZoZ9rAWEW/MePqpTWabwRUi+tzAib2DADhmEPr8m4c8S7/1PgTc37m6S/SDyb86633c2lrmeogAe147KFNaqeYlhG7B02iVzMAql8+6jC3xkn3kac+jRQG046ps4B4d06xnKJzw3QjYfNTNa3XIlrXDbNV7dmvp47ZupDdcYQzR+vqHJXX2HTRNK4REI4qyn5EnSqSvYqiF6JzS3prwvxUk7yvma/43NjyS77Rdjjz2HkTQzxyDB5X9F81NG7M/FmoPU3g3fZRd0UW/BwuUKh5hQsUN360jxiYnbK0Liuatz1MvHGrYPuDnjUu3fEo5aWXJzgtPP/iLzGa3j665GCdcMV9dPmnPFkkYNsZajQD8MDKVsbsbIPG1QlSVEwe6+GU0Lg+4eCJFr1//QqyvoJZahLc2j8spjPdJnrnwHdbSBNPItlIcEqhtwe4ThOcIz/dwUTe54u355iGD8JOTyVEY4PTQnZjghrNvBm/aNXhL9b69sJ5/qZmBgrRyneImM2ZfOJRGldGTB7uYCKhdWWKnhS4KKBuxaDAakV8awjG8MXrv3pfCtkjsSUSBNDv4Kra8+iOc2wrxSmFizVqUrL9rCZ8rMvSSxXB1JAvhwyeaiPOsfOXnybdt7S/uontNX1b+27qCZ/LJvONFuHEcyc67cmVE+0rN/O1GBMpxqcVnSuGyfe1cYG3FoPckvc0dSYE05R4VmDbKTbUqLxetPYw/gvSynyyWSlY+GdSVJSPrTN4NGD/iT5rXy4ouwF1M8JkIXVDUzY1JhLikSFKQpDoHqPO28iRWGHHRsdb5cgbHRIE6G4fV1aoThtXVZitbYL1NQ97K0uK958nvnmAjKeYU8voW7vkT50iubSNWe2ib+54NmvA7uxhp1P0Uh+qGjMaEayvYXb3cM8+ibzwdSQMkHMbPvrxxi3cmXVkmuOSGLU7gDjCthqowQiX59iHTqEu3fRNDdptpN/1q2me4xZnF2W18CX9lujq2reb2jhF/tg60ZdeQ1pN3Ill5OYW0myQP7RC8upNqvPrBJc3EaWQnSNu1re6G+59n/o5cD6bqwtHnagFzxNMTgv6AwcEypLFJXdeXUWXQrVUIzNNMPetMLJbiuyOQxlH0RGqphCNHHlfCKce0CPWswjoBWupjWB20mKXy8Nq/jqzxLua/JSnGpBaiPYVSy950jAbeICQDXyb32jiUb82vIf6vQvC2f5AQOvDO1S15mC3iRoGBHNZFCo6XGyJ9nz74aU/AgRe+61/wGTvmJH0z0S+sRvSu5GjvyVGIcHqSexo7LfE6QxXFKiVJeprNwjOneHaT29w4vNzguEcNcmp1jvoUcH8bAsxkF4fInmJG0/hbj3ZyhIMhmAs0utAWWHWe97Ke+0Kam2Feq2DHhdMH+oQjXzISEZT6o0l9O4YqQ2zJzxcJf3yG0gUeSPJOdx87v+OI4/6tfYQ8Ysxnna93yE/26Vs+a7sndfGqBt3QGuqx04xPhuz9O9vUp1eQn/tIpKlyOD+mI7jFXYE5Rj5+z0kxwp7wORYYQ+YHCvsAZNjhT1gcqywB0yOFfaAydFwnIMA3V9B4gjbbaEOxpgTfdRo7gGZWtj8gR7B1Hmm7FulZ+IeF5T9lOS1W5DEuDTGXbuFardAKexKFzUYY/pt1Kzw3f8aKbMzbRovXPcxwkfPoG/vMX3/KbI3BpQn28Sv3wbAlSX1E2coOyHp9TGysw+dFgTaI7bW+qjdoY93zuegPeyNukbiyKda+l2G37fC4HHP8dj7ekH8xjZuPIaTa+Qbbd+lfWLJXtkErZCbRzyW2G6ech9+5m++Jc+UrySHpUQmVozOee9/vuJo3BTE+d5bderjcrr0HYPSrdKnUFLNbDUgGRrqxOMdfSbY/47GPmBbJ36TmZ70vFVSe8yHqv2YdSpUDc9uuv78HBcIeT8kmFnqhiI6qD2Fu/L4fTEey69qi9OK+VrMzvu1p1CaK5rXhXDs45117GOZdSaYCE58MUec48tf+d8YjW8dxxIfFDmOdHwPyZE4w2yvwew/fm5Rr+UpGu6y5+kK6gTy/3TEbDdDtyrUtRRV+RaIuvTtMFxqaL0e0rxt0aVjdFofkj0n+x77gfNpkWgB9a4zFlloGD1qULkimAlV22ITS7wdIA7KjiUcC90L99ozlm2/AKKxL0XSlW+zeBd2BwuerDXN+GNzlntjBuOM4MUmwQzKtm9F4kLn23sYaNx0KAPmHbimjoTCVGE8hHmUU641CffnmCyiWI6J9wqmpxLGF9skM8HpkOZ1R1B4UGk4dhRdRbrrUUrNGzlSGdItzexEjA2EaGQIZx4jH+SWcGxQlaXONPPlgDoRei8p6lQIp46ipwimCqch3bUUHV+4135jjskCgmlF3QixgRBMa499nC4yBNYespFKbYn3M6pmSj5PMacdq68ZqkzoXTQMHg0J5h5tZSLoXpwjxqHz+7dUPBIKk9oSbA5Aa8I9j5F3kSK7MaFYTum8uMvk1BrhxJenhlNL5w83mb5njezqiOlDHYKpIRyXqKt3oN/BdFKaVybUrZg69VaYEwjm5nDCk1sTwkmCKmrG5xsEmx6A2rtQ+6KFccXw4ZTlF2eU3YhglBNeHWJXusS7Y1wjQQ2n9/o3hwEuDJC8OkyzJBfHtNY32H9SaNwSTCSkewYbCp0rNSZWOAXZjiW6ue9bA9dHnOASrTD9NijI1zPCUe3RvyLYWFGe7BDMHUXP97GsGgo+cIIqU8yW++jSMVvR6DKklW0gpWV2MllMTk12eUBxqkPRD6hamnCsSW9PyDdaqMqy/2SK0367VJXvblQ1AALCKUzOpBRdIZxl1KfbxDszqo2eh7pFb5rC2r4F6gZQn+pQNgVVC3WK757U1bSuzrj+nzRRNcT7julagCpXfcXo5hGnkHXa4+lNGqAqR93wja8Rn3Yv+iEHjztcaAmmiv7LjslJjQ1ZFH8LOnc07ngi5boVEswtNvQNRvc/tIzVQrZTMzkVULQCXNCiThU28BjHgycdrTd8ia0B4qFv6l2nnjek/3VDnQaUbY3UCTo3zNaSRc2zQSoLofJAUnX3EHPsPp0wesIQ73gl1Il3Q/LvaxDMoG76Rqomemfq2LtyJBQmlSHcGqJbqWenrozfFqxFz1PKbky6HdK8YZmteZ+oc6X2NcSpQlWOcOyhb+mtCcHBnKqXonOLUxCNhXRzzt57myy/MEbvjTFLLQ82bUaYKKL/omBDR2PLHHaoDeZ+C25sVozOhqz9zg5JFnvwqBa6L88XWeaFa+R8q2GU8lWYWnHyc3Pa17vMVh3z5XvUErp07D8ZsPyiAYGyoQh3Z75qtLr/GXbshx1B+bb8MBFJROQPRORFEXlFRP7nxfPnReR5EbkkIr8pItHi+Xjx96XF/899J2/mz7u8my2xAH7QOTdZ0D/8voj8W+BvA//AOfcbIvKPgZ8Ffmnxe+Cce0REfhr4+8BffsdPyBLkPU97WHYc+O1wcQ6YWFP0Am59yiFWoBZaV/3Zkm06JqeFbMstqBqg9/oEqQz5Wkbd0OjcUnQ1dezp9+KhRRf2sBxIz2rufDSj6Dm6X/cwOxuCje811MlXHCsvOFpXpszXU4LZooWicT5ENioPqdD9RTtvfBjH5NEO+09o5huG9KYmmEH3ck3RUYzOqf+/vTONkew6z/PznXO3ureW3rtneoYzXEWKFLWQ2qIglmPJO2xncZDEcALHiIMAAQzkR+wgQGAECeD8ih3EPxLAQZwgseAEMeI4jhLbkm0YliKRkiWK0nAZcmZ6eqZ7eqnqqq6qu51z8uPc7pkhOcOhSHp6hH6BRlXduvfW7frqnPudb3lfggl0L/nzpes5qra3bZl9S1OiiKTAHwN/H/hfwIpzrhaRjwO/4Jz7PhH5P83zL4hIgGfJWXS3+aA3mxKDU6vUl19H9fEdi7dd5iYiGk8K9hDwK8B5YOCcO2jGPaAnghuoixpj7gHzwPZrznnIhJMEXfSDD/rSsSzxzd1KUJMSFwXsr6Rc+qcnCbcDkh0vaR9MfWA23rMUPe/dta8Y0rURAPlKho384je5VmBagQ/8Ks98HQ5rUL4taO/+GJP4KIWqveutC8jnhLjvIxjKwPwzfapF37DhlI+cRDsT7yRU9WF5m1jnswzWUq7OcPXjCXXmiPtCZ82iS+94DO/zX//B/7L4Fd8AIs+8zaZ055wBPiAiM8BvAo/eyXFvcs6bmHCK0zOH6qy6shQzIcE0ouxoyo5CjzztkIl8NfABBXmdCq0di4l8BP+A6bNq60Pu4MmJhDrxihBlWx/2b9UtQRkf8Uc1U5z1mmAmFqKhpziS2lMplUuZFydIFEFu/bIgaqNz06hBNKoQTcRejGPwUEy+aHGRIxzpJuTmqd6DsZ90VA0BjqobNuIKt3Yt3pJb75wbiMjngY8DMyISNKPsRnqiA+qiy82U2AMvm3IrSGmI1wa4KIRAYSNNqzDYWNOaGrK1GjFtbOAXz/Geo7NWMLwvJtz3vPE2FML9mmBniliLns8wLU0wrlGTimo2QRkLFtJX+thui6oTIdaRbJXsPZBQtYXupYrRakC2aUhzg64sdaL9InlYoHNNa+IjGXU3Idyd+H6w0fS60IFrRlgY0JoL6X7WsPVkRHvdxznTjYI0EHYfTZh/bsJ4NUEP3PV7o731bepOvMTFZmQhIi3g08C3gM8Df7XZ7W8D/6N5/lvNa5r3P3e7+xfg/8ndAWpngBrso4qaYG2bcGNEeO6yzzVpWPrDq5Q9/FR0ZUi6VTPzrSGqsFQt5X/dVY1MclRRE+3mANg0JN4YUWUBrbURxWoPqQzxuhe6VpUh3arJNgzJ5RGd9dqPmkAR7BW0Lgx8jmswJrw6QKYlMi0JRgUyniJXd3D7E98MkRc+rzcaw7Ud0v/+/9h9T8TsyzVVJrS2PReH1Q2j94mEuF9jQyE+f4344g5SVrf8qu5khJ0Afq25jyngN5xzvy0i3wQ+IyL/HPgq8KvN/r8K/CcReRnYBf76m35CoLH3n0RNSqqFFDWtMWcWqdsheqlDMCooesIrP3nSU+cNDHvvX6DMhOF9PbprtW9Atw6XhNh2jG0FTBci0o0CPS6ZnO0Rb+eMH+w20YmI6f0dwlFNMRtSpT7JOXhyhmhoMVlAOLbU7YjBB3vMvDSlXuriQoXVyo9WQNIIp4VgmOMsoA+Evf2jWlmks25Y/y5F70VQ05rxKU8wluxeD2OVbUX5wCJYcJu3ZnW7E4LLr+M5El+7/RXgdaRfzrkc+PE3O+9Nxwh+GtGCKgwqr6i7CSbyo8YFQtX2rNTjVZgsaqKRJd2yTBc0+ycC6lQ8n9NCSjDykXOnwCQaGyUEE0O+1KJse/ZsnEMXnnO+/Y0tLv74CXQB89+sKGY8V2Lr5W0GTy1jEth+f8rSs/sE1/ap5zL0Xk65nBFvT7CtEKc1RN5ILlCovMYFiumpNuMlxfIXHU45dp5s073ghcPDqWN0SjF3zlDMCvq50ks23gZHIoEpDqSscaHGJAGmYfhMLw0JhyXjExGqhtaOIxwJncsViKdfOGgE17lfV4XDEpX7KSXue84Nq4WwP0VVlnDi0KWl6kaEI9+Vufk9K8R9x+yLjdPrIN6t2PoLK1SZkPR9nFL3J5TLHVwgmE6MKgwmi5CyPuzElNqiphU0j9FeRTEr9B9VZFdLWrsWXVmifb8eNC0oZjzFHxZUUd/2HnYkYolOCS4OobYE44p8ISGYGmySonKDCYXx/TX5oqK1CXtnvTd1EAFPdl2z4BXvYYUxdSugThVV6g1mY3XIM+UEnw/LAoZnAtpXDNtPBARTH6DdPyWEJ2Kyq5b9VUU+L/ReMdTzmSeydA4XKepWQLjncHHoZwnAhZ7GVjDYQDO6L/YKF7FjfCKiagt5L/FR+zlBlTA+ocg2rCeUKX2m4lY4jiUeQRz9/rAwJFhc8aViJ5ehrLC91K+LpiW2k3gqo1bI5sd6LH517HvD7usQ7xaMT7XonNtD1jeRXgeqGrs44+8lDQWSGk6R2mB6GWKuh5ZcoDBZhEkC4s19v+/eBBeF5KuePDnIDeHWxJe5ddu4LPGe4v7ER+ujEIrS940duPbGQBQyeXSZshfgFLQvThFjEetQa9eYfvAM03lNsmsIRxXRhS1fQnf51vex4xF2BHFcNfUdhCMxJdYLGTs/5pUh6lQOK5qCKSAwesDw9z75OT6/9QgPdHb4wpWz5GWIUtYLk5caEaivpHRe9QnN6ZJPSAKYlsNkFil8VZTOhWAMpgVl15E9PKAVVfRHKcYouu0paVQxKUMmeUQ3y9lam6X3fEA54+OMdepjj8HEq5wD1ynQHV44tYLtD1m+/+NfYzEa8fzwBM++cBY1CrC9ms7cGC2O4bCFLTSdb0VIDfV/PuJVU0HhmH0x9+5y5Ad9MRMQ7ZvmtebXTnyU8mKbF9JVuucCwgBv2Azi3EscnnymAlfjAiHpa6qWT6nUseC0LxcIJw4TOeI9hw1guqiwm7PszjVxxJFQSso0gtkXLdITRisdFtYc6VaNiYUyUyR7hqKjaV8p0dOG6LLBodKsdYzOpHz2i+9n5syA4Uuz9C4o4oHDhiFOe+mOdiqYFnQv+nurLt5GaOrPBM75X2RuQLzAQDi2mEgRjmqyq5bpIMEmjnDver2h003OSvsI+2jV3/SDsafgS/YsVvsAr6o4FC7oXvJrnXDiSK9Z4r7zhGSlEO35kjMbO8YrimLGF/4c0OsFU0uyZxAD4dQe/sAOxVKdA9uUnDvH6h/luNjS3+z6kvLCUWU+E1BlwviEr3PsXLQk2xWtjeK267Bjp+MI4t5yOt5g0SgffPzm18GtZ3KVJK/fKPKG51Vpil6YB6XR3S4qTQ/f090uAMHqyTu98jfE5C/dWidIz8+95fMdiXvYTXiDEa9evcyNdUS300a2eX5H5wSwkwlMfNLQDIc3vXfwul6/8iYXfHt0/uRVblUDZXZ23/L5jt4IewO8mfj1UYbZvPbmO70FHIkRZmcyxp/+qI8pKig7/kYvBsKJpf+IJvxwHyWOogrIr2S42EJkoVbeXR8p4h2vIItAPqcOxdeqNpQzjmRbCCYHxTM+k1x1heF7auZWB/Qvzvr0RmyRUuHaNTIJIKtJzsdklz2vvXccBBsJ8cASjm8ewXLDiB6tBgQ/sM3efoI1GnUpOVRCkkf2KTdTXOCQUrH0JR8It79z1BlJj52Om3BvOR3HuC2OxJRImiCPPo5NAhAwSYAyFj32vLnXPtRGrI9eRHsw+3LFdC5Al75f6wCdywWq8FVL49UWre2S8UpM2fGi2wj0Xp4ghSE/mRLuVbhQsfN4gtSOcOLXW+MTimTbUfYaOnMlLDw/PeT3rbPAp1mUr2sM+1NsFFxXCJSmICdQrH1vYHwLiwAAGD9JREFUFxND+5LDJOI5f0PfXlt2hHTLUPQ08dCQrk99euhrf3LLr+pIGEyMRU0K9N4YM5uhh4WvYqp9r9WJ391g8NQS2aalToSip8k2K6wWqo6mfWFMvtwi7OeowT4oRbsymFZI59UxLlDUaUC0k2OTgHo2IdyrCPtTKEoWqp5vJ5r4+vz55wrirQnTk23KnibZLgm3xl4usawR28LGmnB76hOXtUFPCx/tEPHUss3i98yvXqN44jSDhyLaVw3iHOnVkqoT0H1xQt2LiXcr6lQjZU0wsW+vCOfPCjLJceOJT4NUNVLUzfaC4vQso1MKcb6JL9mpKXqaZDunbPtq4ez8gGo2waUJrhHwRgsooU4DwmGJaUfYWCPWEV0ZYFsh1Qmfhtm736dA/EgOfUXUfk2yWzM8GzF4cg65vAnOoUpD2M99hnl/guRFU952Q/dJWUFVM/7IWQYPRT4stu8b5MVYsueusvXhLoMHY0ysvMEO+srsrbtYjp2O20DCCHfAvfg2oBfmMdu3rfS7CUc+gek6KeUnnkYXlqoTHKbyVemwsSA1XPoxC1ZIXw2JB75S1vcoQ9AEf9MrjvYVH/ytW748IJh6XZNw4jCxUKUQTiAeGqqWosp8H1qVerrZ8Skv+eFHG8y+WLP5tCYYC4tfrw5pIupEfF1G7mWzgqlvU1LVDVIeCmys2Hwq9HSxoae5bW05TAxVR9i/z5Je8ZQTC1+vfVL1j79wy+/q6I0wpcEaVJpiJ5PDx+DMaajN9chDs9/tcDBCDkfKQXjqDv7ng2NuHB0qSV4XSTngqn/d4xuMzjsdsUd+hN2Exgi2CRkdPNYX195wv9vh4Ms5/JLewo/z4Jgbp7I3CnsdhMle9/gGhnknptcj43Qc485wbLB7DMcGu0N8O6mQdwPHBrtDfDupkHcDxwa7x3AkvETRCtXKkHaG3dnF1TV6pocZ7BGcOY1Zv3pT0lI/9jBs7WK2d9DdLmY4vC4FBahOBzU/e6jBcnhct4vMz2LW1lGdDq4sEa1heQH6e7hp0540HhOcWsVlLWwa4776vD9vliFhgBnuI0pQ7cxTPiivLotuCFGc88+NQVZXcFc2seMxqtPB7u+j5+eQMMTuDZFTJ5DxlHr9ip92RW4rNHAkDEYQoJYXse0E1esgVY3tZQQ7HcqzC5j3LJNcHDB+ZI5oWDGdDWlbR/HRB0iuTFCVIT/VIdnwxpGyxsQh9dkFwu0Jan9CcXYBGRW+EvijT6AvXMPdtwIXGzLLlUWKk22i3Rw9Lhg9MkvnK1dwsynBA2d9NfF8Fxso9GCMnclg0rjptfF8YO5AB9P48Fig2Xl6gWSwwP4JTfdSTbw5IZ9PSC7sUj90Ar1fMn1knnYcXa9KHh11gssjGpq6WzjyC+d6IWP3Rz/un7c8S6fVTYN4BONThh/5xLOMqoSpCfnaxknySUS7kzPNQ8LQoLWl/PoM8Y4Q5I790/5YmjiqSS06V+ipULctwb5C5748zjw2ptOeUlQh42sZrYUJ+SQiSUvKMqCT5Qw2Osx+NaBqC1XXoae+KzQce2lHVeFplIwvu8P57HH/MfihT32Zti7408Epnn/hFHoYYOYrspkp40GLMC0JAkv4x11fa3mbQtLjEXYE8Y5knEVEi8hXReS3m9fHTDh3AW/Frf9ZfDP6Af4lngnnIaCPZ8CBG5hwgH/V7HeMdwh3SqxyCvgh4F8A/1BEBPiLwN9sdvk14Bfw1EU/2jwH+G/AvxERuR2TgJ3JmH73R3Diy5dVDUVPSHYtkyVN2YXTn77IpIoYlyG7V3qoqfbazKU6pGyd+XpAMPVspZNlXzXltFfSO2DUtjFEg+tUrzaE0cM17eV99q9lSKlQZUOb1G0CzNoRXQ2Z+4ZP1XjKP3+/TfqeykHnDhv60gAxHNLJbr8v4OSn1uiEOV99+Qx6N0CVgo0dpm1QaY0bRriWYekPfF+Y/a233wzxS8A/AjrN63neSSacsEv7pT2fyu/EhFv72DRGX+vTemSF6ULIq+l9BBNh9kXDKlC2hWwDxic8b2Jr25Jd2AMtuECRXQ2oW5q6pYgHNTZUXst5q6CcjQj3a/S0ppiPmX1RmCz0SGtfVifOcyWmW4oyU9QptHYs3W/2Md2EqhvhlBBvXqfLO6ANdFojZYVLQiSvyM5Z1ienWZuDlRcsYiAa+pxdlWrigWOypADF7Ll9bKBQxdugkBWRHwauOeeeFZFP3ol17wQ3MeHInDPPvwD4OfrgcmtAr1+hDbT/683HZ81jdOM5b3ium7/4hm0H+95YzB2/wTaA9DWP3HBdB/2RB51Fb4aVX371DbcffObBdTUi6+DeoHq5wZ2MsE8APyIiP9h8Rhf4Zd5BJpxj3Dne1Olwzv1j59wp59xZPEnK55xzP8E7yYRzjDvG21k4/xzvEBOOnc0Y/sDHfAdmw4lrYl+PYUOwgfDBn3iOrbyNdcK3XlwFKxBadFrjjGDHIeGupveyX3Dnc0I54zskbdA4HBHonIZW1jsFZRempwzLD2yzsTHji60ig9mOcW0DtRB2SqphxNxXfL2JLny9Rt0S0i2LCYVw6g5DUzZsaju00H9E4Z4Y8YHVdS7szbF5aY5gr+H/7Rp0r8Rux8hcSedLLcRA/ZnjhfM9heNS7e8gHBvsHsOxwe4xHIlo/YFYjlOQz3q++DoVsJ4GYnzacfapy1zYnKfbmdB/dZZwT1HNWaQUbMtCYOl9PUJVjmjoKHqKsufFdA50waqOZzMN971gTj7vCZmnKxa9mFOPQ5K1kHylRmrBxRY11Y3IjtB7ER89Ca5nE1QN8dA2HFbe2bDhAQUfXP6044GHNyjqgGuDNu5SRrgnFAsWvTLFOkHWWtQ9w9KfeGfknYh0vKvQ05r2S3uItXSdQ6YF9coMemcfO5PhvixUv7dM++GIhWcty3u77L1vHqf8F5Ts1KjKEvSHOKVwSeA5CHs+oqFKQ9WNSM5vMXnPEuGoQg8LipWMqJ9Td2LKXoQuHDasiXcKivkYp4T2S7uM3jNLdnnsScUmFbYdeaYAY1F5fV3O3l33FJ3WoIX7yg7xOc21v7zC0iuG3jPrjB9b9pXCUQjWMVlx6FLRvjT2/9PbiXT8WcAp37hgWzFVJ0IXnsquXuqCdaiy5uJPGsJXYOd9GarKMDFkG4bBgwEmFiaLiqTfovfSBD2tqDveWGUvBBeSz2lGp1bpXiqouhFVJyTcrykWWgzPhEyXhHgAcd8yWUypW4JJoMrmmCwr8tk288/tM72vQ7RXUc6FqNr5NqW9HGzTeekcTinf1WIc7W9scP6nTlG3LVWmscFJxDiqjkZVjsHDmqVnCvKFEJM0bUzHbG73Fo58xvngFyVhhDPG19Y/8agPrE5y3HCf4kMPoHNDvhDR+doGdmsHCQNYXkTKCjPfQW/0qS+ve1oIrVGdtj9/WflCmzCAyxuY0Qi9tAhlhen3QWmCkyuYxRn03pj6lQu+jqOssNs7SBJjH74Pef48kmVImoB11GuXUVmGnUyQwEcYRSs/PYpgpzn6gfsQ6xh+YBmrYeaL69RrlwlOrWLnOp6B9EvPEayexPYHfnrNj0fYPYUjP8KklSCPP46NAsT4rsuDtlSdG8puyODhkOyqYecJTe/8QWjIH69KqNpC72JN68oUGwfUWeDTKbslxUx4mDPzHp8jGhSYJKDONDYUdh4LSDc9O0AwcbT6lrLtxdiqzLcidV+devVYEWys0ZMKGp56KWt/7zI+1YJSYC1bT3uCFhOL56TX0L7inYoq9dRI4dh7xUvPjP1s85V3SMrj3cLxCLsZx6Gp7yAcG+wew7HB7jEcCafDzqRMP/kRVOkXlHUihxL0qvbMAdH3bLO90UW3DHYQIYVgZ2pkrNEThUktvXOabNPf0PfOelGcdNNRzPiISNnz542GXlHIxN6hKGcdVc+ipgqTGcQJUnonAQe25Yg3Nb3zjSiP9Tk1gHDifB2IdZhIeR3LhpnUaeHqn9PUqwVnTuxw8co84VpMNBSvTX2yRoygx77AtfuKl7y3v32cD7uncOx0fAfhSEyJrpNSfewpL2ufqEY6XhOOvd6XMo7+IxqT+Jr2aI9DrS4T+1S/OGivG4LcHkbMJ4uB11DZqil7Gqub9H3tyK5WlN0Ap6HoKIYPQjAR0g1Hnfj6fjGeTkIXfmqeeanEJNqvDWcCdO4OaWlVZQ81w8CXNajaUfYCtt/nrz0cCcm2Ix5ep1waPAzpVaHswfy3mob2z91LtA/HOPqRjgNiFafFS71riAY145MR0b4fMVc+oQkmQrFsyC5ogqkvrCm7EPeh7EHnkiUZGIKJYXgmRtVepHQ6r6kyL84dTtxNDoMuLcPTAftnoXXVF9jk80J73esuq2aUiYX2miVoGK9N5PfVuR9ZOjdeaNU2Kk21wwbCZDlk68NeMNzEnlgl2XVMF+WwIrlz0ZHP+UgNDtznj0fYPYVjp+M7CMcGu8dwJO5hdFLqjzzlayFEKLs3NHeLeA2vOUe2DsMHIRx5Bdiqg8/0Wl9nsfzlCqfxYtuzwaGnZiIhn1WHHlz7qkGVjulCQN2C0f2+2DQa+HuWGDCR73wRA7oCVUBn3RBMrb/PivcEW9teXE7nBhNr7y025GY2UGx+2KvMmtTzCHfPK6I9R9URxqccwdjLHNsQFp6r/aL7+B52b+HIe4mkCfLexw/1S5xWnvGzP8W0Y8peSP8Rr8onTZVS0jf0HwrpXaiZzmucwOyLOeHOmKqRBD4omQ7GNSbWBPsl49MpwdQSb+eYLKTsBeQ9H8ZqbRv2HgiI+45o7MNNVUsR5I5wVHt6wHZANChRtaXsRUSDAqktNtKH8h1SW5xSoIXdxzueZu+0Y/abkG1UmJbvXasyxWRZITXMvlgS9RtW0z99m8Lb7zakrFHnL4N4g0m37d3jyYRgfpbgXJ/4d3YZ/5WPEu3VRIMCvTNCzBKtL5+n9eDqobI6V7cIN7fRD6zivvwc6slHUaMpIeACTW9ti3pjE/X+xwiu9tFrl0mffuKQUVRMSnbuGvWrF6k+9RR6akm/uQFaQVESZi0YDJF2RnIpR8IQN56gAZyFMPJJzYYzZPaZXXS3i6tr7BMPEmwNcWGAa0Wo0ZTeXBu9vg1B4AXvnLutLPDxlHgEceSnREli1MOPItZSLrfR09pPM02pQLSTc+mHe4RDmDnvlYlMrGht5lTdiLrlR2Z2eYIUFTaNsIFieH+L7gXfHKf3S/pPdJn7Wh+bRn7qaZya0ZkEq4XZc/sM3pMRTnx4Kp9VdC/Wh1Nr9qrvEi0XUlRhCAY5Ng0R60sEpFG3JdBQ1RBohu+dY7qg2H26ZvkPNOmWr6E0kWL7fRHLz+aMVyJaOzXhsPTlBs8d8RKBzswp94Hv+lnA1zmo2nm95aGlSoXRfQpxUHUc5UpF65XI9y5HXhK46lmCiSJd933R4dQxnfURkwPpKhP7yIIYyDZ8X/J4RSEGilnPvaFKwWn/fZQzlvZFTb7okAqSHSG95lm9fWrGny8aNekV4w5TKuJ8FbCNhPGSYnQ/JI/sUZYB+rk2uvRtUCaG6bIj3hVfjbzn0JXjG5/9JfZ3147wCNubkPzPLwHX20izG97vfRvnTN98lzva5+0iBRbf4jHKjW/53pEwmOum5J/8CFI7posBre2afFajKygzoW4Jw0cswUioM4cuhdaGV08XCyZxiBWSa17MLZgaposhRVcakTU/yqqOHxXxwHf628ivpWzkK7Dq1BH3hbLnqFNHsq2oW/74aAi9V6rDNeIBSbNTQjD18U6n8GwFTSWVU8J4WbPzlAHtUBNNtKfILrvDa5muWOJthS7wujGA/d9HvLZeFTXtr67jshbpeQdakZ53niSs06KabWGjhHjgF8G9V0qUcV4efuwDuKpyxBv7/ovSQrSjoLa4Vkgxn3jSr/0C0/Uq7FJUuFCTL6fUqWL/hGb1D6eMziT0XnUk1wrGpxJMJKTXapyG1vo+2bmpJwqb6yCTApkWEIW+3PygIPaAv76qiU/Pky9ktC9b5r+wQXlyhnI2YlppwolFF5rWliWcWjrPXoEwQI9v7SUeCYM5rbALvcZhiLGRb2ZAQZ1o6lQzfKSm83JAnUE+H9O65qU9OmuW4ZmQuO+o0y7p5QlOCVU3opwJDiUKB4+2iYct4p2K4SMJ7fXy8L4zWtXowrH1wZRo6KhaUDzUIp8Tkl3H9pMh8a6jtS6MH13EhkKyVVCc7hD3PaObKmovQpDXmCjwdfbWki/FINB/rzBePUl6tZF1vFiw+eGEuXM1w/sC9luKaLDkIyVXb02/d0exRBG5ICLPicifisgzzbY5EfldEXmpeZxttouI/OuGuujrIvKhNz2/A5zDZjEu1NhYYyOFiTXRXolYR3YxYHLSUrUdC9+oCaeOeOAousoTscwIqnCHyg11W/tmBevTHNlGhZ5aJidiWls1wbjCRgob+T7lfEHIrvrQkzKQbtV+6rQQ7zp6r5ReGwZobeaowk+9B10skteHsvdiTKPrqUi2SqYrltlvOUzoE69B4di7P8bEsPVkQDRyBFOfBMW6m+SsXou3MsK+2zl3IznKzwO/75z7RRH5+eb1zwE/ADzc/H0Uz45za11BvGdVzbWwDSmKDQQTC7p02PmYvbM+M9x5VTF4X832EwHxwDFe9SI0xZwvrqkzRbGYNh0kgg2EnSdSgqlXsq0TP2WF+zA6kxIU3r3OZz2bznjZZ7frRCjbQj7vWW/CfSjmgibWCflCcri0UC1NuF9jw4P4ZwQiOO0/a3g2Rpam7D7eItoDq4W664uA6gzCoRcujQcOE2svxhrcehzdkVsvIheAp280mIi8AHzSOXdVRE4Af+Cce4+I/Nvm+a+/dr9bnf944Xwz3ol8mAP+r4g821AOASzfYIQNYLl5fkhd1OBGWqNDiMjPiMgzIvJMRXGHl3GMO50S/7xzbl1EloDfFZFzN77pnHMi8pZW4DdSF/XCJafnFpGs5aezMIAo9NSsEx+pGHz8FHG/ZnQ6Ipw44oEPxk5WImY++y3fDlRUyMQbX2rjW5DWt7HLc6i9MeWpOYJRgdrew6UJ9Pdgtod56RV2/45Xau++WlLMhbQvTQjWthl+5DQmEtqXc4IX1qCukXYbl7Vgp+9bm0RwtfGtRoAz1pcL1DUszTN4cp6yK8w9P2HwcMriH67jwoDJQ/OEkxo1rVF5hdoZQqCRK+Ebf2l8G5EOEfkFYB/4uxxPie8K3lYsUUQyQDnnRs3z7wX+Gdcpin6R11MX/QMR+Qze2di7nbEAyFrIe5uIuQOTBjf1+ppWwJVPxHQvOPqPQfdlf5iqoU49jWvZEboXa5LtHKkt0xMprctjpqcyJgsaG0H3Ys1kKSDpG5yGYN9QzIUEU8vuYwHtNUs+14S0LMyeKxifjBifFDqXLJ0LUx8K6/p2WZz37IJRgdOCTUKc4Ft+8VmIfCXj0vcFRENFsuUTpdmmpegJxaxfPLfXHOm1mtb6CBcF8PzbS68sA7/pKRIJgP/inPusiHwZ+A0R+WngIvDXmv1/B/hB4GVgAvzUm36CO3CHHSb11HZVFqBDhdXC3gMh1aMT6o0WOJguCUF+XTJqvOpItv2XV7VDdOFjftVcQt1S1C3fgzV4KGT2XNH8CDTB1DBqR1z7kMa0LKAIxlBlUGeOyUpEerUpG3c+gmESTZAb6sT3KKui9v1g1qL3i+tN6VEAzrF3f4ibK6lsSJ0Is+cgGhnymYCy52hf8j+6vftDdJmhyluLvcERCf6KyAh44W5fxx1igddwP74LOOOce8MQ5JGIdAAvOOeevtsXcScQkWfu5rUeV03dYzg22D2Go2Kwf3e3L+At4K5e65FwOo5x5zgqI+wYd4i7bjAR+X4ReaFJx/z8Xb6Wfy8i10TkGzdse8fSSO8E7qrBREQDv4JPybwX+Bsi8t67eEn/Afj+12w7SCM9DPx+8xpuTiP9DD6N9K7jbo+wjwAvO+decc6VwGfwyhJ3Bc65P8ITS9+IH8UrX9A8/tgN2/+j8/ginhb+xLt9jXfbYHeUirnLeFtppHcad9tg9xQa/v276lbfbYMdqEgc4EaFiaOCzYOprnm81my/K9d+tw32ZeDhRosswosS/NZdvqbX4kali9emkf5W4y1+jDtJI70TcM7d1T98KuZF4DzwT+7ytfw6cBWo8Pekn8YrM/0+8BLwe8Bcs6/gPdzzwHP4mpd3/RqPIx33GO72lHiMt4hjg91jODbYPYZjg91jODbYPYZjg91jODbYPYZjg91j+P97ZD2vY5SrVAAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "from tqdm import tqdm\n", + "import time\n", + "\n", + "LOSS = []\n", + "maxlen = 50000\n", + "\n", + "for count in range(epoch):\n", + " pbar = tqdm(\n", + " range(0, len(X), batch_size), desc = 'minibatch loop')\n", + " train_cost = []\n", + " for i in pbar:\n", + " batch_x = X[i : min(i + batch_size, len(X))]\n", + " batch_x = tf.keras.preprocessing.sequence.pad_sequences(\n", + " batch_x, dtype = 'float32', padding = 'post'\n", + " )[:, :maxlen]\n", + " while True:\n", + " try:\n", + " _, cost = sess.run(\n", + " [model.optimizer, model.cost],\n", + " feed_dict = {model.X: batch_x},\n", + " )\n", + " break\n", + " except Exception as e:\n", + " print(e)\n", + " time.sleep(1)\n", + " train_cost.append(cost)\n", + " pbar.set_postfix(cost = cost)\n", + " train_cost = np.mean(train_cost)\n", + " LOSS.append(train_cost)\n", + " print('epoch %d, training avg cost %f'%(count + 1, train_cost))\n", + " \n", + " p, l, t = sess.run([tf.nn.sigmoid(model.predictions), model.labels, model.targets], feed_dict = {model.X: X[:1]})\n", + " plt.plot(p)\n", + " plt.plot(l)\n", + " plt.show()\n", + " \n", + " plt.imshow(t[0].T)\n", + " plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "batch_x = X[-2:]\n", + "batch_x = tf.keras.preprocessing.sequence.pad_sequences(\n", + " batch_x, dtype = 'float32', padding = 'post'\n", + ")\n", + "logits, targets, neg = sess.run([model.logits, model.targets, model.negatives], feed_dict = {model.X: batch_x})\n", + "logits.shape, targets.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "plt.figure(figsize = (15, 5))\n", + "\n", + "plt.subplot(1,3,1)\n", + "plt.imshow(targets[0].T)\n", + "plt.subplot(1,3,2)\n", + "plt.imshow(logits[0].T)\n", + "plt.subplot(1,3,3)\n", + "plt.plot(X[-2])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "p, l = sess.run([tf.nn.sigmoid(model.predictions), model.labels], feed_dict = {model.X: X[:1]})\n", + "plt.plot(p)\n", + "plt.plot(l)\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "saver = tf.train.Saver(tf.trainable_variables())\n", + "saver.save(sess, 'wav2vec/model.ckpt')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/spelling-correction/1.bert-base.ipynb b/spelling-correction/1.bert-base.ipynb new file mode 100644 index 0000000..9ef5f62 --- /dev/null +++ b/spelling-correction/1.bert-base.ipynb @@ -0,0 +1,784 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# data from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/utils/helpers.py\n", + "# reuploaded to husein's S3\n", + "# !wget https://malaya-dataset.s3-ap-southeast-1.amazonaws.com/counts_1grams.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "with open('counts_1grams.txt') as fopen:\n", + " f = fopen.read().split('\\n')[:-1]\n", + " \n", + "words = {}\n", + "for l in f:\n", + " w, c = l.split('\\t')\n", + " c = int(c)\n", + " words[w] = c + words.get(w, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# original from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/classes/spellcorrect.py\n", + "# improved it\n", + "\n", + "import re\n", + "from collections import Counter\n", + "\n", + "class SpellCorrector:\n", + " \"\"\"\n", + " The SpellCorrector extends the functionality of the Peter Norvig's\n", + " spell-corrector in http://norvig.com/spell-correct.html\n", + " \"\"\"\n", + "\n", + " def __init__(self):\n", + " \"\"\"\n", + " :param corpus: the statistics from which corpus to use for the spell correction.\n", + " \"\"\"\n", + " super().__init__()\n", + " self.WORDS = words\n", + " self.N = sum(self.WORDS.values())\n", + " \n", + " @staticmethod\n", + " def tokens(text):\n", + " return REGEX_TOKEN.findall(text.lower())\n", + "\n", + " def P(self, word):\n", + " \"\"\"\n", + " Probability of `word`.\n", + " \"\"\"\n", + " return self.WORDS[word] / self.N\n", + "\n", + " def most_probable(self, words):\n", + " _known = self.known(words)\n", + " if _known:\n", + " return max(_known, key=self.P)\n", + " else:\n", + " return []\n", + "\n", + " @staticmethod\n", + " def edit_step(word):\n", + " \"\"\"\n", + " All edits that are one edit away from `word`.\n", + " \"\"\"\n", + " letters = 'abcdefghijklmnopqrstuvwxyz'\n", + " splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n", + " deletes = [L + R[1:] for L, R in splits if R]\n", + " transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]\n", + " replaces = [L + c + R[1:] for L, R in splits if R for c in letters]\n", + " inserts = [L + c + R for L, R in splits for c in letters]\n", + " return set(deletes + transposes + replaces + inserts)\n", + "\n", + " def edits2(self, word):\n", + " \"\"\"\n", + " All edits that are two edits away from `word`.\n", + " \"\"\"\n", + " return (e2 for e1 in self.edit_step(word)\n", + " for e2 in self.edit_step(e1))\n", + "\n", + " def known(self, words):\n", + " \"\"\"\n", + " The subset of `words` that appear in the dictionary of WORDS.\n", + " \"\"\"\n", + " return set(w for w in words if w in self.WORDS)\n", + "\n", + " def edit_candidates(self, word, assume_wrong=False, fast=True):\n", + " \"\"\"\n", + " Generate possible spelling corrections for word.\n", + " \"\"\"\n", + "\n", + " if fast:\n", + " ttt = self.known(self.edit_step(word)) or {word}\n", + " else:\n", + " ttt = self.known(self.edit_step(word)) or self.known(self.edits2(word)) or {word}\n", + " \n", + " ttt = self.known([word]) | ttt\n", + " return list(ttt)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "corrector = SpellCorrector()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['sting',\n", + " 'eying',\n", + " 'epting',\n", + " 'etang',\n", + " 'eling',\n", + " 'ewing',\n", + " 'ebing',\n", + " 'eting',\n", + " 'geting',\n", + " 'eking',\n", + " 'eing',\n", + " 'etin',\n", + " 'etling',\n", + " 'meting',\n", + " 'enting',\n", + " 'etting',\n", + " 'ting',\n", + " 'ering',\n", + " 'eating',\n", + " 'edting',\n", + " 'ating',\n", + " 'elting',\n", + " 'reting',\n", + " 'kting',\n", + " 'beting']" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "possible_states = corrector.edit_candidates('eting')\n", + "possible_states" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "tokenization.validate_case_matches_checkpoint(True,BERT_INIT_CHKPNT)\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'scientist suggests **mask** burger can lead to obesity'" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = 'scientist suggests eting burger can lead to obesity'\n", + "text_mask = text.replace('eting', '**mask**')\n", + "text_mask" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "def tokens_to_masked_ids(tokens, mask_ind):\n", + " masked_tokens = tokens[:]\n", + " masked_tokens[mask_ind] = \"[MASK]\"\n", + " masked_tokens = [\"[CLS]\"] + masked_tokens + [\"[SEP]\"]\n", + " masked_ids = tokenizer.convert_tokens_to_ids(masked_tokens)\n", + " return masked_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.variable_scope('cls/predictions'):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " output_layer,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + " \n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " self.logits = tf.nn.bias_add(logits, output_bias)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " ,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cls = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'cls')\n", + "cls" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = var_lists + cls)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['scientist suggests sting burger can lead to obesity',\n", + " 'scientist suggests eying burger can lead to obesity',\n", + " 'scientist suggests epting burger can lead to obesity',\n", + " 'scientist suggests etang burger can lead to obesity',\n", + " 'scientist suggests eling burger can lead to obesity',\n", + " 'scientist suggests ewing burger can lead to obesity',\n", + " 'scientist suggests ebing burger can lead to obesity',\n", + " 'scientist suggests eting burger can lead to obesity',\n", + " 'scientist suggests geting burger can lead to obesity',\n", + " 'scientist suggests eking burger can lead to obesity',\n", + " 'scientist suggests eing burger can lead to obesity',\n", + " 'scientist suggests etin burger can lead to obesity',\n", + " 'scientist suggests etling burger can lead to obesity',\n", + " 'scientist suggests meting burger can lead to obesity',\n", + " 'scientist suggests enting burger can lead to obesity',\n", + " 'scientist suggests etting burger can lead to obesity',\n", + " 'scientist suggests ting burger can lead to obesity',\n", + " 'scientist suggests ering burger can lead to obesity',\n", + " 'scientist suggests eating burger can lead to obesity',\n", + " 'scientist suggests edting burger can lead to obesity',\n", + " 'scientist suggests ating burger can lead to obesity',\n", + " 'scientist suggests elting burger can lead to obesity',\n", + " 'scientist suggests reting burger can lead to obesity',\n", + " 'scientist suggests kting burger can lead to obesity',\n", + " 'scientist suggests beting burger can lead to obesity']" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "replaced_masks = [text_mask.replace('**mask**', state) for state in possible_states]\n", + "replaced_masks" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "def get_score(mask):\n", + " tokens = tokenizer.tokenize(mask)\n", + " input_ids = [tokens_to_masked_ids(tokens, i) for i in range(len(tokens))]\n", + " preds = sess.run(tf.nn.softmax(model.logits), feed_dict = {model.X: input_ids})\n", + " tokens_ids = tokenizer.convert_tokens_to_ids(tokens)\n", + " return np.prod([preds[i, i + 1, x] for i, x in enumerate(tokens_ids)])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[4.1319517e-26,\n", + " 1.0869552e-22,\n", + " 1.5277633e-28,\n", + " 2.4073189e-23,\n", + " 3.768507e-25,\n", + " 4.0500935e-25,\n", + " 1.3330295e-28,\n", + " 9.22324e-29,\n", + " 1.7535894e-26,\n", + " 9.990078e-24,\n", + " 2.9091794e-26,\n", + " 3.610259e-28,\n", + " 5.3360014e-29,\n", + " 2.9510165e-26,\n", + " 7.3765675e-27,\n", + " 2.3287322e-26,\n", + " 1.0582614e-27,\n", + " 2.0237078e-22,\n", + " 1.3728026e-17,\n", + " 1.438714e-28,\n", + " 1.0349554e-25,\n", + " 7.9180676e-28,\n", + " 2.1763072e-27,\n", + " 6.3879305e-29,\n", + " 5.0968306e-28]" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores = [get_score(mask) for mask in replaced_masks]\n", + "scores" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([3.00979019e-09, 7.91758248e-06, 1.11285104e-11, 1.75353557e-06,\n", + " 2.74505041e-08, 2.95016296e-08, 9.71003399e-12, 6.71837889e-12,\n", + " 1.27734701e-09, 7.27695749e-07, 2.11910023e-09, 2.62977938e-11,\n", + " 3.88684224e-12, 2.14957496e-09, 5.37322797e-10, 1.69629166e-09,\n", + " 7.70857198e-11, 1.47410619e-05, 9.99974787e-01, 1.04798592e-11,\n", + " 7.53880691e-09, 5.76766690e-11, 1.58526248e-10, 4.65308694e-12,\n", + " 3.71262569e-11], dtype=float32)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prob_scores = np.array(scores) / np.sum(scores)\n", + "prob_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[('sting', 3.0097902e-09),\n", + " ('eying', 7.9175825e-06),\n", + " ('epting', 1.11285104e-11),\n", + " ('etang', 1.7535356e-06),\n", + " ('eling', 2.7450504e-08),\n", + " ('ewing', 2.950163e-08),\n", + " ('ebing', 9.710034e-12),\n", + " ('eting', 6.718379e-12),\n", + " ('geting', 1.277347e-09),\n", + " ('eking', 7.2769575e-07),\n", + " ('eing', 2.1191002e-09),\n", + " ('etin', 2.6297794e-11),\n", + " ('etling', 3.8868422e-12),\n", + " ('meting', 2.149575e-09),\n", + " ('enting', 5.373228e-10),\n", + " ('etting', 1.6962917e-09),\n", + " ('ting', 7.708572e-11),\n", + " ('ering', 1.4741062e-05),\n", + " ('eating', 0.9999748),\n", + " ('edting', 1.0479859e-11),\n", + " ('ating', 7.538807e-09),\n", + " ('elting', 5.767667e-11),\n", + " ('reting', 1.5852625e-10),\n", + " ('kting', 4.653087e-12),\n", + " ('beting', 3.7126257e-11)]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(zip(possible_states, prob_scores))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/spelling-correction/2.xlnet-base.ipynb b/spelling-correction/2.xlnet-base.ipynb new file mode 100644 index 0000000..7466439 --- /dev/null +++ b/spelling-correction/2.xlnet-base.ipynb @@ -0,0 +1,846 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n", + "# !unzip xlnet.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import sentencepiece as spm\n", + "from prepro_utils import preprocess_text, encode_ids\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "\n", + "def tokenize_fn(text):\n", + " text = preprocess_text(text, lower= False)\n", + " return encode_ids(sp_model, text)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " \"\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# data from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/utils/helpers.py\n", + "# reuploaded to husein's S3\n", + "# !wget https://malaya-dataset.s3-ap-southeast-1.amazonaws.com/counts_1grams.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "with open('counts_1grams.txt') as fopen:\n", + " f = fopen.read().split('\\n')[:-1]\n", + " \n", + "words = {}\n", + "for l in f:\n", + " w, c = l.split('\\t')\n", + " c = int(c)\n", + " words[w] = c + words.get(w, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# original from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/classes/spellcorrect.py\n", + "# improved it\n", + "\n", + "import re\n", + "from collections import Counter\n", + "\n", + "class SpellCorrector:\n", + " \"\"\"\n", + " The SpellCorrector extends the functionality of the Peter Norvig's\n", + " spell-corrector in http://norvig.com/spell-correct.html\n", + " \"\"\"\n", + "\n", + " def __init__(self):\n", + " \"\"\"\n", + " :param corpus: the statistics from which corpus to use for the spell correction.\n", + " \"\"\"\n", + " super().__init__()\n", + " self.WORDS = words\n", + " self.N = sum(self.WORDS.values())\n", + " \n", + " @staticmethod\n", + " def tokens(text):\n", + " return REGEX_TOKEN.findall(text.lower())\n", + "\n", + " def P(self, word):\n", + " \"\"\"\n", + " Probability of `word`.\n", + " \"\"\"\n", + " return self.WORDS[word] / self.N\n", + "\n", + " def most_probable(self, words):\n", + " _known = self.known(words)\n", + " if _known:\n", + " return max(_known, key=self.P)\n", + " else:\n", + " return []\n", + "\n", + " @staticmethod\n", + " def edit_step(word):\n", + " \"\"\"\n", + " All edits that are one edit away from `word`.\n", + " \"\"\"\n", + " letters = 'abcdefghijklmnopqrstuvwxyz'\n", + " splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n", + " deletes = [L + R[1:] for L, R in splits if R]\n", + " transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]\n", + " replaces = [L + c + R[1:] for L, R in splits if R for c in letters]\n", + " inserts = [L + c + R for L, R in splits for c in letters]\n", + " return set(deletes + transposes + replaces + inserts)\n", + "\n", + " def edits2(self, word):\n", + " \"\"\"\n", + " All edits that are two edits away from `word`.\n", + " \"\"\"\n", + " return (e2 for e1 in self.edit_step(word)\n", + " for e2 in self.edit_step(e1))\n", + "\n", + " def known(self, words):\n", + " \"\"\"\n", + " The subset of `words` that appear in the dictionary of WORDS.\n", + " \"\"\"\n", + " return set(w for w in words if w in self.WORDS)\n", + "\n", + " def edit_candidates(self, word, assume_wrong=False, fast=True):\n", + " \"\"\"\n", + " Generate possible spelling corrections for word.\n", + " \"\"\"\n", + "\n", + " if fast:\n", + " ttt = self.known(self.edit_step(word)) or {word}\n", + " else:\n", + " ttt = self.known(self.edit_step(word)) or self.known(self.edits2(word)) or {word}\n", + " \n", + " ttt = self.known([word]) | ttt\n", + " return list(ttt)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "corrector = SpellCorrector()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['working', 'wolfing', 'walking', 'wilking', 'woking', 'wonking']" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "possible_states = corrector.edit_candidates('wolking')\n", + "possible_states" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'**mask** is good for health'" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = 'wolking is good for health'\n", + "text_mask = text.replace('wolking', '**mask**')\n", + "text_mask" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "PADDING_TEXT = \"\"\"\n", + " The quick brown fox jumps over the lazy dog. A horrible, messy split second presents\n", + " itself to the heart-shaped version as Scott is moved. The upcoming movie benefits at \n", + " the mental cost of ages 14 to 12. Nothing substantial is happened for almost 48 days. \n", + " When that happens, we lose our heart. \n", + "\"\"\"\n", + "padded_text = tokenize_fn(PADDING_TEXT)\n", + "\n", + "def tokens_to_masked_ids(tokens, mask_ind):\n", + " masked_tokens = tokens\n", + " masked_tokens[mask_ind] = MASK_ID\n", + " segment_id = [SEG_ID_A] * len(masked_tokens)\n", + " input_mask = [0] * len(masked_tokens)\n", + " perm_masks = np.zeros((1, len(masked_tokens)))\n", + " perm_masks[0, mask_ind] = 1.0\n", + " target_mappings = np.zeros((1, len(masked_tokens)))\n", + " target_mappings[0, mask_ind] = 1.0\n", + " \n", + " return masked_tokens, segment_id, input_mask, perm_masks, target_mappings" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['working is good for health',\n", + " 'wolfing is good for health',\n", + " 'walking is good for health',\n", + " 'wilking is good for health',\n", + " 'woking is good for health',\n", + " 'wonking is good for health']" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "replaced_masks = [text_mask.replace('**mask**', state) for state in possible_states]\n", + "replaced_masks" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/testing/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:70: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import xlnet\n", + "import tensorflow as tf\n", + "import model_utils\n", + "\n", + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.perm_masks = tf.placeholder(tf.float32, [None, None, None])\n", + " self.target_mappings = tf.placeholder(tf.float32, [None, None, None])\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=self.X,\n", + " seg_ids=self.segment_ids,\n", + " input_mask=self.input_masks,\n", + " perm_mask = self.perm_masks,\n", + " target_mapping = self.target_mappings\n", + " )\n", + " \n", + " output = xlnet_model.get_sequence_output()\n", + " self.output = output\n", + " lookup_table = xlnet_model.get_embedding_table()\n", + "\n", + " initializer = xlnet_model.get_initializer()\n", + " with tf.variable_scope('model', reuse = tf.AUTO_REUSE):\n", + " with tf.variable_scope('lm_loss'):\n", + " softmax_w = lookup_table\n", + " softmax_b = tf.get_variable(\n", + " 'bias',\n", + " [xlnet_config.n_token],\n", + " dtype = output.dtype,\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.einsum('ibd,nd->ibn', output, softmax_w) + softmax_b\n", + " self.logits = logits" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:686: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:693: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:797: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:99: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5, 5, 32000)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokens = tokenize_fn(replaced_masks[0])\n", + "input_ids = [tokens_to_masked_ids(tokens, i) for i in range(len(tokens))]\n", + "a = list(zip(*input_ids))\n", + "batch_x = np.array(a[0])\n", + "batch_segment = np.array(a[1])\n", + "batch_mask = np.array(a[2])\n", + "perm_masks = np.array(a[3])\n", + "target_mappings = np.array(a[4])\n", + "preds = sess.run(tf.nn.softmax(model.logits), \n", + " feed_dict = {model.X: batch_x, \n", + " model.segment_ids: batch_segment,\n", + " model.input_masks: batch_mask,\n", + " model.perm_masks: perm_masks,\n", + " model.target_mappings: target_mappings})\n", + "preds.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "def get_score(mask):\n", + " tokens = tokenize_fn(mask)\n", + " input_ids = [tokens_to_masked_ids(tokens, i) for i in range(len(tokens))]\n", + " a = list(zip(*input_ids))\n", + " batch_x = np.array(a[0])\n", + " batch_segment = np.array(a[1])\n", + " batch_mask = np.array(a[2])\n", + " perm_masks = np.array(a[3])\n", + " target_mappings = np.array(a[4])\n", + " preds = sess.run(tf.nn.log_softmax(model.logits), \n", + " feed_dict = {model.X: batch_x, \n", + " model.segment_ids: batch_segment,\n", + " model.input_masks: batch_mask,\n", + " model.perm_masks: perm_masks,\n", + " model.target_mappings: target_mappings})\n", + " tokens_ids = tokens\n", + " preds = preds.astype('float64')\n", + " return np.sum([preds[i, i, x] for i, x in enumerate(tokens_ids)])" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[-100.41720771789551,\n", + " -120.50064468383789,\n", + " -100.41720771789551,\n", + " -140.5840950012207,\n", + " -140.5840950012207,\n", + " -120.50064468383789]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores = [get_score(mask) for mask in replaced_masks]\n", + "scores" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "scores = np.exp(np.array(scores).astype('float64'))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([4.99999999e-01, 9.48078180e-10, 4.99999999e-01, 1.79768047e-18,\n", + " 1.79768047e-18, 9.48078180e-10])" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prob_scores = np.array(scores) / np.sum(scores)\n", + "prob_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[('working', 0.49999999905192183),\n", + " ('wolfing', 9.480781803042756e-10),\n", + " ('walking', 0.49999999905192183),\n", + " ('wilking', 1.7976804735628786e-18),\n", + " ('woking', 1.7976804735628786e-18),\n", + " ('wonking', 9.480781803042756e-10)]" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(zip(possible_states, prob_scores))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/spelling-correction/3.bert-base-fast.ipynb b/spelling-correction/3.bert-base-fast.ipynb new file mode 100644 index 0000000..73b717c --- /dev/null +++ b/spelling-correction/3.bert-base-fast.ipynb @@ -0,0 +1,971 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# data from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/utils/helpers.py\n", + "# reuploaded to husein's S3\n", + "# !wget https://malaya-dataset.s3-ap-southeast-1.amazonaws.com/counts_1grams.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "with open('counts_1grams.txt') as fopen:\n", + " f = fopen.read().split('\\n')[:-1]\n", + " \n", + "words = {}\n", + "for l in f:\n", + " w, c = l.split('\\t')\n", + " c = int(c)\n", + " words[w] = c + words.get(w, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# original from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/classes/spellcorrect.py\n", + "# improved it\n", + "\n", + "import re\n", + "from collections import Counter\n", + "\n", + "class SpellCorrector:\n", + " \"\"\"\n", + " The SpellCorrector extends the functionality of the Peter Norvig's\n", + " spell-corrector in http://norvig.com/spell-correct.html\n", + " \"\"\"\n", + "\n", + " def __init__(self):\n", + " \"\"\"\n", + " :param corpus: the statistics from which corpus to use for the spell correction.\n", + " \"\"\"\n", + " super().__init__()\n", + " self.WORDS = words\n", + " self.N = sum(self.WORDS.values())\n", + " \n", + " @staticmethod\n", + " def tokens(text):\n", + " return REGEX_TOKEN.findall(text.lower())\n", + "\n", + " def P(self, word):\n", + " \"\"\"\n", + " Probability of `word`.\n", + " \"\"\"\n", + " return self.WORDS[word] / self.N\n", + "\n", + " def most_probable(self, words):\n", + " _known = self.known(words)\n", + " if _known:\n", + " return max(_known, key=self.P)\n", + " else:\n", + " return []\n", + "\n", + " @staticmethod\n", + " def edit_step(word):\n", + " \"\"\"\n", + " All edits that are one edit away from `word`.\n", + " \"\"\"\n", + " letters = 'abcdefghijklmnopqrstuvwxyz'\n", + " splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n", + " deletes = [L + R[1:] for L, R in splits if R]\n", + " transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]\n", + " replaces = [L + c + R[1:] for L, R in splits if R for c in letters]\n", + " inserts = [L + c + R for L, R in splits for c in letters]\n", + " return set(deletes + transposes + replaces + inserts)\n", + "\n", + " def edits2(self, word):\n", + " \"\"\"\n", + " All edits that are two edits away from `word`.\n", + " \"\"\"\n", + " return (e2 for e1 in self.edit_step(word)\n", + " for e2 in self.edit_step(e1))\n", + "\n", + " def known(self, words):\n", + " \"\"\"\n", + " The subset of `words` that appear in the dictionary of WORDS.\n", + " \"\"\"\n", + " return set(w for w in words if w in self.WORDS)\n", + "\n", + " def edit_candidates(self, word, assume_wrong=False, fast=True):\n", + " \"\"\"\n", + " Generate possible spelling corrections for word.\n", + " \"\"\"\n", + "\n", + " if fast:\n", + " ttt = self.known(self.edit_step(word)) or {word}\n", + " else:\n", + " ttt = self.known(self.edit_step(word)) or self.known(self.edits2(word)) or {word}\n", + " \n", + " ttt = self.known([word]) | ttt\n", + " return list(ttt)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "corrector = SpellCorrector()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['etling',\n", + " 'etting',\n", + " 'ewing',\n", + " 'meting',\n", + " 'etang',\n", + " 'beting',\n", + " 'enting',\n", + " 'edting',\n", + " 'eing',\n", + " 'sting',\n", + " 'ting',\n", + " 'eying',\n", + " 'eting',\n", + " 'reting',\n", + " 'ering',\n", + " 'kting',\n", + " 'epting',\n", + " 'ebing',\n", + " 'geting',\n", + " 'etin',\n", + " 'ating',\n", + " 'eating',\n", + " 'elting',\n", + " 'eking',\n", + " 'eling']" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "possible_states = corrector.edit_candidates('eting')\n", + "possible_states" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "import unicodedata\n", + "\n", + "def whitespace_tokenize(text):\n", + " text = text.strip()\n", + " if not text:\n", + " return []\n", + " tokens = text.split()\n", + " return tokens\n", + "\n", + "class BasicTokenizer(object):\n", + "\n", + " def __init__(self, do_lower_case=True, never_split=None):\n", + " if never_split is None:\n", + " never_split = []\n", + " self.do_lower_case = do_lower_case\n", + " self.never_split = never_split\n", + "\n", + " def tokenize(self, text, never_split=None):\n", + " never_split = self.never_split + (never_split if never_split is not None else [])\n", + " text = self._clean_text(text)\n", + " orig_tokens = whitespace_tokenize(text)\n", + " split_tokens = []\n", + " for token in orig_tokens:\n", + " if token not in never_split:\n", + " if self.do_lower_case:\n", + " token = token.lower()\n", + " token = self._run_strip_accents(token)\n", + " split_tokens.extend(self._run_split_on_punc(token))\n", + " else:\n", + " split_tokens.append(token)\n", + "\n", + " output_tokens = whitespace_tokenize(\" \".join(split_tokens))\n", + " return output_tokens\n", + "\n", + " def _run_strip_accents(self, text):\n", + " text = unicodedata.normalize(\"NFD\", text)\n", + " output = []\n", + " for char in text:\n", + " cat = unicodedata.category(char)\n", + " if cat == \"Mn\":\n", + " continue\n", + " output.append(char)\n", + " return \"\".join(output)\n", + "\n", + " def _run_split_on_punc(self, text, never_split=None):\n", + " if never_split is not None and text in never_split:\n", + " return [text]\n", + " chars = list(text)\n", + " i = 0\n", + " start_new_word = True\n", + " output = []\n", + " while i < len(chars):\n", + " char = chars[i]\n", + " if _is_punctuation(char):\n", + " output.append([char])\n", + " start_new_word = True\n", + " else:\n", + " if start_new_word:\n", + " output.append([])\n", + " start_new_word = False\n", + " output[-1].append(char)\n", + " i += 1\n", + "\n", + " return [\"\".join(x) for x in output]\n", + " \n", + " def _clean_text(self, text):\n", + " output = []\n", + " for char in text:\n", + " cp = ord(char)\n", + " if cp == 0 or cp == 0xfffd or _is_control(char):\n", + " continue\n", + " if _is_whitespace(char):\n", + " output.append(\" \")\n", + " else:\n", + " output.append(char)\n", + " return \"\".join(output)\n", + " \n", + "def _is_control(char):\n", + " if char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n", + " return False\n", + " cat = unicodedata.category(char)\n", + " if cat.startswith(\"C\"):\n", + " return True\n", + " return False\n", + "\n", + "def _is_whitespace(char):\n", + " if char == \" \" or char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n", + " return True\n", + " cat = unicodedata.category(char)\n", + " if cat == \"Zs\":\n", + " return True\n", + " return False\n", + "\n", + "def _is_punctuation(char):\n", + " cp = ord(char)\n", + " if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or\n", + " (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):\n", + " return True\n", + " cat = unicodedata.category(char)\n", + " if cat.startswith(\"P\"):\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from bert.tokenization import WordpieceTokenizer, load_vocab, convert_by_vocab\n", + "\n", + "class FullTokenizer(object):\n", + " def __init__(self, vocab_file, do_lower_case=True):\n", + " self.vocab = load_vocab(vocab_file)\n", + " self.inv_vocab = {v: k for k, v in self.vocab.items()}\n", + " self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case, \n", + " never_split = ['[CLS]', '[MASK]', '[SEP]'])\n", + " self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)\n", + "\n", + " def tokenize(self, text):\n", + " split_tokens = []\n", + " for token in self.basic_tokenizer.tokenize(text):\n", + " for sub_token in self.wordpiece_tokenizer.tokenize(token):\n", + " split_tokens.append(sub_token)\n", + "\n", + " return split_tokens\n", + "\n", + " def convert_tokens_to_ids(self, tokens):\n", + " return convert_by_vocab(self.vocab, tokens)\n", + "\n", + " def convert_ids_to_tokens(self, ids):\n", + " return convert_by_vocab(self.inv_vocab, ids)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "tokenizer = FullTokenizer(vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'[CLS] scientist suggests **mask** burger can lead to obesity [SEP]'" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = '[CLS] scientist suggests eting burger can lead to obesity [SEP]'\n", + "text_mask = text.replace('eting', '**mask**')\n", + "text_mask" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def get_indices(mask, word):\n", + " splitted = mask.split('**mask**')\n", + " left = tokenizer.tokenize(splitted[0])\n", + " middle = tokenizer.tokenize(word)\n", + " right = tokenizer.tokenize(splitted[1])\n", + " indices = [i for i in range(len(left))]\n", + " for i in range(len(right)):\n", + " indices.append(i + len(middle) + len(left))\n", + " \n", + " indices = indices[1:-1]\n", + " tokenized = tokenizer.tokenize(mask.replace('**mask**',word))\n", + " ids = tokenizer.convert_tokens_to_ids(tokenized)\n", + " ids_left = tokenizer.convert_tokens_to_ids(left)\n", + " ids_right = tokenizer.convert_tokens_to_ids(right)\n", + " indices_word = ids_left + ids_right\n", + " return ids, indices, indices_word[1:-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "([101, 7155, 6083, 3802, 2989, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3802, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 24023, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 2777, 2075, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 27859, 3070, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 6655, 2075, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 4372, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 16417, 2290, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 12072, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 28642, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 1041, 14147, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3802, 2075, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 2128, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 11781, 2290, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 1047, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 4958, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 1041, 10472, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 2131, 2075, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3802, 2378, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 2012, 2075, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 5983, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3449, 3436, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 23174, 3070, 15890, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 12005, 3070, 15890, 2064, 2599, 2000, 24552, 102])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indices = [get_indices(text_mask, word) for word in possible_states]\n", + "ids, seq_ids, word_ids = list(zip(*indices))\n", + "ids" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.variable_scope('cls/predictions'):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " output_layer,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + " \n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " self.logits = tf.nn.bias_add(logits, output_bias)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " ,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cls = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'cls')\n", + "cls" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = var_lists + cls)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(25, 11)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "masked_padded = tf.keras.preprocessing.sequence.pad_sequences(ids,padding='post')\n", + "masked_padded.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(25, 11, 30522)" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "preds = sess.run(tf.nn.softmax(model.logits), feed_dict = {model.X: masked_padded})\n", + "preds.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[2.689204e-05,\n", + " 3.36932e-05,\n", + " 4.1889663e-05,\n", + " 2.2712533e-05,\n", + " 3.127968e-05,\n", + " 1.5012656e-05,\n", + " 3.465448e-05,\n", + " 5.2485917e-05,\n", + " 7.6286415e-05,\n", + " 3.5186342e-05,\n", + " 1.9021903e-05,\n", + " 3.2630334e-05,\n", + " 1.2884642e-05,\n", + " 4.779812e-05,\n", + " 9.0476e-05,\n", + " 3.1589767e-05,\n", + " 4.9742277e-05,\n", + " 4.847102e-05,\n", + " 3.391391e-05,\n", + " 1.6768146e-05,\n", + " 1.9604393e-05,\n", + " 9.826456e-05,\n", + " 3.581642e-05,\n", + " 4.2474054e-05,\n", + " 4.910487e-05]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scores = []\n", + "\n", + "for no, ids in enumerate(seq_ids):\n", + " scores.append(np.prod(preds[no, ids, word_ids[no]]))\n", + " \n", + "scores" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0.0269283 , 0.03373863, 0.04194615, 0.02274316, 0.03132186,\n", + " 0.0150329 , 0.03470121, 0.05255669, 0.07638928, 0.03523379,\n", + " 0.01904755, 0.03267433, 0.01290202, 0.04786257, 0.090598 ,\n", + " 0.03163236, 0.04980935, 0.04853638, 0.03395964, 0.01679076,\n", + " 0.01963083, 0.09839706, 0.03586472, 0.04253133, 0.04917109],\n", + " dtype=float32)" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prob_scores = np.array(scores) / np.sum(scores)\n", + "prob_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[('eating', 0.09839706),\n", + " ('ering', 0.090598),\n", + " ('eing', 0.07638928),\n", + " ('edting', 0.052556694),\n", + " ('epting', 0.04980935),\n", + " ('eling', 0.049171086),\n", + " ('ebing', 0.048536383),\n", + " ('reting', 0.047862574),\n", + " ('eking', 0.04253133),\n", + " ('ewing', 0.04194615),\n", + " ('elting', 0.03586472),\n", + " ('sting', 0.03523379),\n", + " ('enting', 0.03470121),\n", + " ('geting', 0.033959642),\n", + " ('etting', 0.03373863),\n", + " ('eying', 0.032674335),\n", + " ('kting', 0.031632364),\n", + " ('etang', 0.03132186),\n", + " ('etling', 0.026928302),\n", + " ('meting', 0.02274316),\n", + " ('ating', 0.019630829),\n", + " ('ting', 0.019047555),\n", + " ('etin', 0.016790757),\n", + " ('beting', 0.0150329005),\n", + " ('eting', 0.012902017)]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "probs = list(zip(possible_states, prob_scores))\n", + "probs.sort(key = lambda x: x[1]) \n", + "probs[::-1]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/spelling-correction/4.bert-accurate.ipynb b/spelling-correction/4.bert-accurate.ipynb new file mode 100644 index 0000000..1c8f675 --- /dev/null +++ b/spelling-correction/4.bert-accurate.ipynb @@ -0,0 +1,943 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# data from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/utils/helpers.py\n", + "# reuploaded to husein's S3\n", + "# !wget https://malaya-dataset.s3-ap-southeast-1.amazonaws.com/counts_1grams.txt" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('counts_1grams.txt') as fopen:\n", + " f = fopen.read().split('\\n')[:-1]\n", + " \n", + "words = {}\n", + "for l in f:\n", + " w, c = l.split('\\t')\n", + " c = int(c)\n", + " words[w] = c + words.get(w, 0)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# original from https://github.com/cbaziotis/ekphrasis/blob/master/ekphrasis/classes/spellcorrect.py\n", + "# improved it\n", + "\n", + "import re\n", + "from collections import Counter\n", + "\n", + "class SpellCorrector:\n", + " \"\"\"\n", + " The SpellCorrector extends the functionality of the Peter Norvig's\n", + " spell-corrector in http://norvig.com/spell-correct.html\n", + " \"\"\"\n", + "\n", + " def __init__(self):\n", + " \"\"\"\n", + " :param corpus: the statistics from which corpus to use for the spell correction.\n", + " \"\"\"\n", + " super().__init__()\n", + " self.WORDS = words\n", + " self.N = sum(self.WORDS.values())\n", + " \n", + " @staticmethod\n", + " def tokens(text):\n", + " return REGEX_TOKEN.findall(text.lower())\n", + "\n", + " def P(self, word):\n", + " \"\"\"\n", + " Probability of `word`.\n", + " \"\"\"\n", + " return self.WORDS[word] / self.N\n", + "\n", + " def most_probable(self, words):\n", + " _known = self.known(words)\n", + " if _known:\n", + " return max(_known, key=self.P)\n", + " else:\n", + " return []\n", + "\n", + " @staticmethod\n", + " def edit_step(word):\n", + " \"\"\"\n", + " All edits that are one edit away from `word`.\n", + " \"\"\"\n", + " letters = 'abcdefghijklmnopqrstuvwxyz'\n", + " splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n", + " deletes = [L + R[1:] for L, R in splits if R]\n", + " transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R) > 1]\n", + " replaces = [L + c + R[1:] for L, R in splits if R for c in letters]\n", + " inserts = [L + c + R for L, R in splits for c in letters]\n", + " return set(deletes + transposes + replaces + inserts)\n", + "\n", + " def edits2(self, word):\n", + " \"\"\"\n", + " All edits that are two edits away from `word`.\n", + " \"\"\"\n", + " return (e2 for e1 in self.edit_step(word)\n", + " for e2 in self.edit_step(e1))\n", + "\n", + " def known(self, words):\n", + " \"\"\"\n", + " The subset of `words` that appear in the dictionary of WORDS.\n", + " \"\"\"\n", + " return set(w for w in words if w in self.WORDS)\n", + "\n", + " def edit_candidates(self, word, assume_wrong=False, fast=True):\n", + " \"\"\"\n", + " Generate possible spelling corrections for word.\n", + " \"\"\"\n", + "\n", + " if fast:\n", + " ttt = self.known(self.edit_step(word)) or {word}\n", + " else:\n", + " ttt = self.known(self.edit_step(word)) or self.known(self.edits2(word)) or {word}\n", + " \n", + " ttt = self.known([word]) | ttt\n", + " return list(ttt)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "corrector = SpellCorrector()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['edting',\n", + " 'reting',\n", + " 'etang',\n", + " 'eling',\n", + " 'beting',\n", + " 'eating',\n", + " 'ering',\n", + " 'eking',\n", + " 'ebing',\n", + " 'eting',\n", + " 'geting',\n", + " 'etting',\n", + " 'ating',\n", + " 'enting',\n", + " 'eying',\n", + " 'meting',\n", + " 'epting',\n", + " 'etling',\n", + " 'ting',\n", + " 'sting',\n", + " 'elting',\n", + " 'eing',\n", + " 'etin',\n", + " 'kting',\n", + " 'ewing']" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "possible_states = corrector.edit_candidates('eting')\n", + "possible_states" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "tokenization.validate_case_matches_checkpoint(True,BERT_INIT_CHKPNT)\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'scientist suggests **mask** berger can lead to obesity'" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = 'scientist suggests eting berger can lead to obesity'\n", + "text_mask = text.replace('eting', '**mask**')\n", + "text_mask" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "def tokens_to_masked_ids(tokens, mask_ind):\n", + " masked_tokens = tokens[:]\n", + " masked_tokens[mask_ind] = \"[MASK]\"\n", + " masked_tokens = [\"[CLS]\"] + masked_tokens + [\"[SEP]\"]\n", + " masked_ids = tokenizer.convert_tokens_to_ids(masked_tokens)\n", + " return masked_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.variable_scope('cls/predictions'):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " output_layer,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + " \n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " self.logits = tf.nn.bias_add(logits, output_bias)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " ,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cls = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'cls')\n", + "cls" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = var_lists + cls)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['scientist suggests edting berger can lead to obesity',\n", + " 'scientist suggests reting berger can lead to obesity',\n", + " 'scientist suggests etang berger can lead to obesity',\n", + " 'scientist suggests eling berger can lead to obesity',\n", + " 'scientist suggests beting berger can lead to obesity',\n", + " 'scientist suggests eating berger can lead to obesity',\n", + " 'scientist suggests ering berger can lead to obesity',\n", + " 'scientist suggests eking berger can lead to obesity',\n", + " 'scientist suggests ebing berger can lead to obesity',\n", + " 'scientist suggests eting berger can lead to obesity',\n", + " 'scientist suggests geting berger can lead to obesity',\n", + " 'scientist suggests etting berger can lead to obesity',\n", + " 'scientist suggests ating berger can lead to obesity',\n", + " 'scientist suggests enting berger can lead to obesity',\n", + " 'scientist suggests eying berger can lead to obesity',\n", + " 'scientist suggests meting berger can lead to obesity',\n", + " 'scientist suggests epting berger can lead to obesity',\n", + " 'scientist suggests etling berger can lead to obesity',\n", + " 'scientist suggests ting berger can lead to obesity',\n", + " 'scientist suggests sting berger can lead to obesity',\n", + " 'scientist suggests elting berger can lead to obesity',\n", + " 'scientist suggests eing berger can lead to obesity',\n", + " 'scientist suggests etin berger can lead to obesity',\n", + " 'scientist suggests kting berger can lead to obesity',\n", + " 'scientist suggests ewing berger can lead to obesity']" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "replaced_masks = [text_mask.replace('**mask**', state) for state in possible_states]\n", + "replaced_masks" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[101, 103, 6083, 3968, 3436, 16758, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 103, 3968, 3436, 16758, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 103, 3436, 16758, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 103, 16758, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 103, 2064, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 16758, 103, 2599, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 16758, 2064, 103, 2000, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 16758, 2064, 2599, 103, 24552, 102],\n", + " [101, 7155, 6083, 3968, 3436, 16758, 2064, 2599, 2000, 103, 102]]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokens = tokenizer.tokenize(replaced_masks[0])\n", + "input_ids = [tokens_to_masked_ids(tokens, i) for i in range(len(tokens))]\n", + "input_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[7155, 6083, 3968, 3436, 16758, 2064, 2599, 2000, 24552]" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokens_ids = tokenizer.convert_tokens_to_ids(tokens)\n", + "tokens_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_ids(mask):\n", + " tokens = tokenizer.tokenize(mask)\n", + " input_ids = [tokens_to_masked_ids(tokens, i) for i in range(len(tokens))]\n", + " tokens_ids = tokenizer.convert_tokens_to_ids(tokens)\n", + " return tokens, input_ids, tokens_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "ids = [get_score(mask) for mask in replaced_masks]\n", + "tokens, input_ids, tokens_ids = list(zip(*ids))" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "indices, ids = [], []\n", + "for i in range(len(input_ids)):\n", + " indices.extend([i] * len(input_ids[i]))\n", + " ids.extend(input_ids[i])" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[101, 103, 6083, 3968, 3436, 16758, 2064, 2599, 2000, 24552, 102]" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ids[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(221, 11)" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "masked_padded = tf.keras.preprocessing.sequence.pad_sequences(ids,padding='post')\n", + "masked_padded.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(221, 11, 30522)" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "preds = sess.run(tf.nn.log_softmax(model.logits), feed_dict = {model.X: masked_padded})\n", + "preds.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[-70.87423,\n", + " -63.164944,\n", + " -62.3369,\n", + " -63.397655,\n", + " -69.86493,\n", + " -45.841267,\n", + " -62.576523,\n", + " -57.582092,\n", + " -73.42107,\n", + " -71.33391,\n", + " -70.08537,\n", + " -67.14623,\n", + " -67.53539,\n", + " -62.374245,\n", + " -61.71485,\n", + " -60.225086,\n", + " -73.1943,\n", + " -73.97394,\n", + " -67.466835,\n", + " -63.56203,\n", + " -67.8916,\n", + " -65.7337,\n", + " -67.74832,\n", + " -73.778435,\n", + " -62.557587]" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indices = np.array(indices)\n", + "scores = []\n", + "\n", + "for i in range(len(tokens)):\n", + " filter_preds = preds[indices == i]\n", + " total = np.sum([filter_preds[k, k + 1, x] for k, x in enumerate(tokens_ids[i])])\n", + " scores.append(total)\n", + " \n", + "scores" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([0.04307465, 0.03838924, 0.03788599, 0.03853067, 0.04246124,\n", + " 0.02786057, 0.03803162, 0.0349962 , 0.04462252, 0.04335402,\n", + " 0.04259521, 0.04080892, 0.04104543, 0.03790868, 0.03750793,\n", + " 0.03660251, 0.0444847 , 0.04495853, 0.04100376, 0.03863057,\n", + " 0.04126192, 0.03995043, 0.04117484, 0.04483971, 0.03802011],\n", + " dtype=float32)" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prob_scores = np.array(scores) / np.sum(scores)\n", + "prob_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[('eating', 0.02786057),\n", + " ('eking', 0.034996197),\n", + " ('meting', 0.03660251),\n", + " ('eying', 0.03750793),\n", + " ('etang', 0.037885986),\n", + " ('enting', 0.037908684),\n", + " ('ewing', 0.03802011),\n", + " ('ering', 0.03803162),\n", + " ('reting', 0.03838924),\n", + " ('eling', 0.038530674),\n", + " ('sting', 0.038630575),\n", + " ('eing', 0.039950434),\n", + " ('etting', 0.040808916),\n", + " ('ting', 0.041003764),\n", + " ('ating', 0.04104543),\n", + " ('etin', 0.04117484),\n", + " ('elting', 0.041261923),\n", + " ('beting', 0.042461235),\n", + " ('geting', 0.04259521),\n", + " ('edting', 0.04307465),\n", + " ('eting', 0.043354023),\n", + " ('epting', 0.044484697),\n", + " ('ebing', 0.044622518),\n", + " ('kting', 0.044839714),\n", + " ('etling', 0.04495853)]" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "probs = list(zip(possible_states, prob_scores))\n", + "probs.sort(key = lambda x: x[1]) \n", + "probs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-augmentation/7.bert-base.ipynb b/text-augmentation/7.bert-base.ipynb new file mode 100644 index 0000000..2e905d1 --- /dev/null +++ b/text-augmentation/7.bert-base.ipynb @@ -0,0 +1,857 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", + "# !unzip uncased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "tokenization.validate_case_matches_checkpoint(True,BERT_INIT_CHKPNT)\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = 'A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people'\n", + "text" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "28" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "aug_percent = 0.8\n", + "splitted = text.split()\n", + "size = len(splitted)\n", + "cnt = int(aug_percent * size)\n", + "cnt" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/6/stopwords-json/master/dist/en.json" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "with open('en.json') as fopen:\n", + " stopwords = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[34, 20, 8, 17, 18, 4, 29, 14, 1, 32, 0, 15, 7, 25, 5, 11, 22, 27]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import random\n", + "import string\n", + "\n", + "results = []\n", + "samples = random.sample([i for i in range(size)], cnt)\n", + "for token_idx, token in enumerate(samples):\n", + " if splitted[token] in string.punctuation:\n", + " continue\n", + " if splitted[token] in stopwords:\n", + " continue\n", + " results.append(token)\n", + " \n", + "results" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "import unicodedata\n", + "\n", + "def whitespace_tokenize(text):\n", + " text = text.strip()\n", + " if not text:\n", + " return []\n", + " tokens = text.split()\n", + " return tokens\n", + "\n", + "class BasicTokenizer(object):\n", + "\n", + " def __init__(self, do_lower_case=True, never_split=None):\n", + " if never_split is None:\n", + " never_split = []\n", + " self.do_lower_case = do_lower_case\n", + " self.never_split = never_split\n", + "\n", + " def tokenize(self, text, never_split=None):\n", + " never_split = self.never_split + (never_split if never_split is not None else [])\n", + " text = self._clean_text(text)\n", + " orig_tokens = whitespace_tokenize(text)\n", + " split_tokens = []\n", + " for token in orig_tokens:\n", + " if token not in never_split:\n", + " if self.do_lower_case:\n", + " token = token.lower()\n", + " token = self._run_strip_accents(token)\n", + " split_tokens.extend(self._run_split_on_punc(token))\n", + " else:\n", + " split_tokens.append(token)\n", + "\n", + " output_tokens = whitespace_tokenize(\" \".join(split_tokens))\n", + " return output_tokens\n", + "\n", + " def _run_strip_accents(self, text):\n", + " text = unicodedata.normalize(\"NFD\", text)\n", + " output = []\n", + " for char in text:\n", + " cat = unicodedata.category(char)\n", + " if cat == \"Mn\":\n", + " continue\n", + " output.append(char)\n", + " return \"\".join(output)\n", + "\n", + " def _run_split_on_punc(self, text, never_split=None):\n", + " if never_split is not None and text in never_split:\n", + " return [text]\n", + " chars = list(text)\n", + " i = 0\n", + " start_new_word = True\n", + " output = []\n", + " while i < len(chars):\n", + " char = chars[i]\n", + " if _is_punctuation(char):\n", + " output.append([char])\n", + " start_new_word = True\n", + " else:\n", + " if start_new_word:\n", + " output.append([])\n", + " start_new_word = False\n", + " output[-1].append(char)\n", + " i += 1\n", + "\n", + " return [\"\".join(x) for x in output]\n", + " \n", + " def _clean_text(self, text):\n", + " output = []\n", + " for char in text:\n", + " cp = ord(char)\n", + " if cp == 0 or cp == 0xfffd or _is_control(char):\n", + " continue\n", + " if _is_whitespace(char):\n", + " output.append(\" \")\n", + " else:\n", + " output.append(char)\n", + " return \"\".join(output)\n", + " \n", + "def _is_control(char):\n", + " if char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n", + " return False\n", + " cat = unicodedata.category(char)\n", + " if cat.startswith(\"C\"):\n", + " return True\n", + " return False\n", + "\n", + "def _is_whitespace(char):\n", + " if char == \" \" or char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n", + " return True\n", + " cat = unicodedata.category(char)\n", + " if cat == \"Zs\":\n", + " return True\n", + " return False\n", + "\n", + "def _is_punctuation(char):\n", + " cp = ord(char)\n", + " if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or\n", + " (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):\n", + " return True\n", + " cat = unicodedata.category(char)\n", + " if cat.startswith(\"P\"):\n", + " return True\n", + " return False" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from bert.tokenization import WordpieceTokenizer, load_vocab, convert_by_vocab\n", + "\n", + "class FullTokenizer(object):\n", + " def __init__(self, vocab_file, do_lower_case=True):\n", + " self.vocab = load_vocab(vocab_file)\n", + " self.inv_vocab = {v: k for k, v in self.vocab.items()}\n", + " self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case, \n", + " never_split = ['[CLS]', '[MASK]', '[SEP]'])\n", + " self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)\n", + "\n", + " def tokenize(self, text):\n", + " split_tokens = []\n", + " for token in self.basic_tokenizer.tokenize(text):\n", + " for sub_token in self.wordpiece_tokenizer.tokenize(token):\n", + " split_tokens.append(sub_token)\n", + "\n", + " return split_tokens\n", + "\n", + " def convert_tokens_to_ids(self, tokens):\n", + " return convert_by_vocab(self.vocab, tokens)\n", + "\n", + " def convert_ids_to_tokens(self, ids):\n", + " return convert_by_vocab(self.inv_vocab, ids)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "tokenizer = FullTokenizer(vocab_file=BERT_VOCAB, do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['[CLS]', 'i', 'like', 'to', 'eat', 'hu', '##sei', '##n', '[MASK]', '[SEP]']" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.tokenize('[CLS] i like to eat husein [MASK] [SEP]')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "maskeds, indices = [], []\n", + "\n", + "for index in results:\n", + " new = splitted[:]\n", + " new[index] = '[MASK]'\n", + " tokens = tokenizer.tokenize(' '.join(new))\n", + " new = [\"[CLS]\"] + tokens + [\"[SEP]\"]\n", + " mask_index = new.index('[MASK]')\n", + " masked_ids = tokenizer.convert_tokens_to_ids(new)\n", + " maskeds.append(masked_ids)\n", + " indices.append(mask_index)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def top_k_logits(logits, k):\n", + " if k == 0:\n", + " return logits\n", + "\n", + " def _top_k():\n", + " values, _ = tf.nn.top_k(logits, k=k)\n", + " min_values = values[:, -1, tf.newaxis]\n", + " return tf.where(\n", + " logits < min_values,\n", + " tf.ones_like(logits, dtype=logits.dtype) * -1e10,\n", + " logits,\n", + " )\n", + " return tf.cond(\n", + " tf.equal(k, 0),\n", + " lambda: logits,\n", + " lambda: _top_k(),\n", + " )\n", + "\n", + "\n", + "def top_p_logits(logits, p):\n", + " with tf.variable_scope('top_p_logits'):\n", + " logits_sort = tf.sort(logits, direction='DESCENDING')\n", + " probs_sort = tf.nn.softmax(logits_sort)\n", + " probs_sums = tf.cumsum(probs_sort, axis=1, exclusive=True)\n", + " logits_masked = tf.where(probs_sums < p, logits_sort, tf.ones_like(\n", + " logits_sort)*1000) # [batchsize, vocab]\n", + " min_logits = tf.reduce_min(logits_masked, axis=1, keepdims=True) # [batchsize, 1]\n", + " return tf.where(\n", + " logits < min_logits,\n", + " tf.ones_like(logits, dtype=logits.dtype) * -1e10,\n", + " logits,\n", + " )\n", + "\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.top_p = tf.placeholder(tf.float32, None)\n", + " self.top_k = tf.placeholder(tf.int32, None)\n", + " self.k = tf.placeholder(tf.int32, None)\n", + " self.temperature = tf.placeholder(tf.float32, None)\n", + " self.indices = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_sequence_output()\n", + " embedding = model.get_embedding_table()\n", + " \n", + " with tf.variable_scope('cls/predictions'):\n", + " with tf.variable_scope('transform'):\n", + " input_tensor = tf.layers.dense(\n", + " output_layer,\n", + " units = bert_config.hidden_size,\n", + " activation = modeling.get_activation(bert_config.hidden_act),\n", + " kernel_initializer = modeling.create_initializer(\n", + " bert_config.initializer_range\n", + " ),\n", + " )\n", + " input_tensor = modeling.layer_norm(input_tensor)\n", + " \n", + " output_bias = tf.get_variable(\n", + " 'output_bias',\n", + " shape = [bert_config.vocab_size],\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.matmul(input_tensor, embedding, transpose_b = True)\n", + " self.logits = tf.nn.bias_add(logits, output_bias)\n", + " \n", + " \n", + " logits = tf.gather_nd(self.logits, self.indices)\n", + " logits = logits / self.temperature\n", + " \n", + " def necleus():\n", + " return top_p_logits(logits, self.top_p)\n", + " \n", + " def select_k():\n", + " return top_k_logits(logits, self.top_k)\n", + " \n", + " logits = tf.cond(self.top_p > 0, necleus, select_k)\n", + " self.samples = tf.multinomial(\n", + " logits, num_samples=self.k, output_dtype=tf.int32)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :26: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :87: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.random.categorical` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " ,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cls = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'cls')\n", + "cls" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = var_lists + cls)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(18, 42)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "masked_padded = tf.keras.preprocessing.sequence.pad_sequences(maskeds,padding='post')\n", + "masked_padded.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(18, 2)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_indices = np.array([np.arange(len(indices)), indices]).T\n", + "batch_indices.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "samples = sess.run(model.samples, feed_dict = {model.X: masked_padded,\n", + " model.top_p: 0.8,\n", + " model.top_k: 100,\n", + " model.temperature: 0.8,\n", + " model.indices: batch_indices,\n", + " model.k: 5})" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SAMPLE 0\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: a politician is a person involved in national politics or a person holding or holding office in which they propose, propose and implement laws or regulations that protect the country and, by , its .\n", + "\n", + "SAMPLE 1\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: a politician is a person involved in local politics or a person holding or holding office in which they propose, propose and implement laws or regulations that protect the country and, by protecting its .\n", + "\n", + "SAMPLE 2\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: a politician is a person active in local politics or a person holding or holding office in which they propose, propose and implement laws or regulations that protect the country and, by , its .\n", + "\n", + "SAMPLE 3\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: a politician is a person active in territorial politics or a person holding or holding office in which they propose, propose and implement laws or regulations that benefit the country and, by , its .\n", + "\n", + "SAMPLE 4\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: a politician is a person active in local politics or a person holding or holding power in which they propose, propose and implement laws or policies that protect the country and, by , its .\n", + "\n" + ] + } + ], + "source": [ + "for i in range(samples.shape[1]):\n", + " print('SAMPLE %d'%(i))\n", + " sample_i = samples[:, i]\n", + " samples_tokens = tokenizer.convert_ids_to_tokens(sample_i)\n", + " new_splitted = splitted[:]\n", + " for no, index in enumerate(results):\n", + " new_splitted[index] = samples_tokens[no]\n", + "\n", + " new = ' '.join(new_splitted)\n", + " print('BEFORE:', text)\n", + " print('AFTER:', new)\n", + " print()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-augmentation/8.xlnet-augmentation.ipynb b/text-augmentation/8.xlnet-augmentation.ipynb new file mode 100644 index 0000000..12dfcae --- /dev/null +++ b/text-augmentation/8.xlnet-augmentation.ipynb @@ -0,0 +1,820 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n", + "# !unzip xlnet.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = ''" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import sentencepiece as spm\n", + "from prepro_utils import preprocess_text, encode_ids\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "\n", + "def tokenize_fn(text):\n", + " text = preprocess_text(text, lower= False)\n", + " return encode_ids(sp_model, text)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " \"\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text = 'A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people'\n", + "text" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "28" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "aug_percent = 0.8\n", + "splitted = text.split()\n", + "size = len(splitted)\n", + "cnt = int(aug_percent * size)\n", + "cnt" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "with open('en.json') as fopen:\n", + " stopwords = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 18, 4, 25, 19, 14, 0, 23, 11, 30, 27, 17, 15, 20, 5, 8, 29, 22]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import random\n", + "import string\n", + "\n", + "results = []\n", + "samples = random.sample([i for i in range(size)], cnt)\n", + "for token_idx, token in enumerate(samples):\n", + " if splitted[token] in string.punctuation:\n", + " continue\n", + " if splitted[token] in stopwords:\n", + " continue\n", + " results.append(token)\n", + " \n", + "results" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def tokenizer(string, mask_id):\n", + " string = string.split()\n", + " ids = []\n", + " for no, word in enumerate(string):\n", + " if no == mask_id:\n", + " ids.append(MASK_ID)\n", + " ids.extend(tokenize_fn(word))\n", + " mask_ind = ids.index(MASK_ID)\n", + " segment_id = [SEG_ID_A] * len(ids)\n", + " input_mask = [0] * len(ids)\n", + " \n", + " perm_masks = np.zeros((1, len(ids)))\n", + " perm_masks[0, mask_ind] = 1.0\n", + " target_mappings = np.zeros((1, len(ids)))\n", + " target_mappings[0, mask_ind] = 1.0\n", + " \n", + " return ids, segment_id, input_mask, mask_ind, perm_masks, target_mappings" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/testing/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:70: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n" + ] + } + ], + "source": [ + "import xlnet\n", + "import tensorflow as tf\n", + "import model_utils\n", + "\n", + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def top_k_logits(logits, k):\n", + " if k == 0:\n", + " return logits\n", + "\n", + " def _top_k():\n", + " values, _ = tf.nn.top_k(logits, k=k)\n", + " min_values = values[:, -1, tf.newaxis]\n", + " return tf.where(\n", + " logits < min_values,\n", + " tf.ones_like(logits, dtype=logits.dtype) * -1e10,\n", + " logits,\n", + " )\n", + " return tf.cond(\n", + " tf.equal(k, 0),\n", + " lambda: logits,\n", + " lambda: _top_k(),\n", + " )\n", + "\n", + "def top_p_logits(logits, p):\n", + " with tf.variable_scope('top_p_logits'):\n", + " logits_sort = tf.sort(logits, direction='DESCENDING')\n", + " probs_sort = tf.nn.softmax(logits_sort)\n", + " probs_sums = tf.cumsum(probs_sort, axis=1, exclusive=True)\n", + " logits_masked = tf.where(probs_sums < p, logits_sort, tf.ones_like(\n", + " logits_sort)*1000) # [batchsize, vocab]\n", + " min_logits = tf.reduce_min(logits_masked, axis=1, keepdims=True) # [batchsize, 1]\n", + " return tf.where(\n", + " logits < min_logits,\n", + " tf.ones_like(logits, dtype=logits.dtype) * -1e10,\n", + " logits,\n", + " )\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.perm_masks = tf.placeholder(tf.float32, [None, None, None])\n", + " self.target_mappings = tf.placeholder(tf.float32, [None, None, None])\n", + " self.top_p = tf.placeholder(tf.float32, None)\n", + " self.top_k = tf.placeholder(tf.int32, None)\n", + " self.k = tf.placeholder(tf.int32, None)\n", + " self.temperature = tf.placeholder(tf.float32, None)\n", + " self.indices = tf.placeholder(tf.int32, [None, None])\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=self.X,\n", + " seg_ids=self.segment_ids,\n", + " input_mask=self.input_masks,\n", + " perm_mask = self.perm_masks,\n", + " target_mapping = self.target_mappings\n", + " )\n", + " \n", + " output = xlnet_model.get_sequence_output()\n", + " self.output = output\n", + " lookup_table = xlnet_model.get_embedding_table()\n", + "\n", + " initializer = xlnet_model.get_initializer()\n", + " with tf.variable_scope('model', reuse = tf.AUTO_REUSE):\n", + " with tf.variable_scope('lm_loss'):\n", + " softmax_w = lookup_table\n", + " softmax_b = tf.get_variable(\n", + " 'bias',\n", + " [xlnet_config.n_token],\n", + " dtype = output.dtype,\n", + " initializer = tf.zeros_initializer(),\n", + " )\n", + " logits = tf.einsum('ibd,nd->ibn', output, softmax_w) + softmax_b\n", + " self.logits = logits\n", + " \n", + " logits = tf.gather_nd(self.logits, self.indices)\n", + " logits = logits / self.temperature\n", + " \n", + " def necleus():\n", + " return top_p_logits(logits, self.top_p)\n", + " \n", + " def select_k():\n", + " return top_k_logits(logits, self.top_k)\n", + " \n", + " logits = tf.cond(self.top_p > 0, necleus, select_k)\n", + " self.samples = tf.multinomial(\n", + " logits, num_samples=self.k, output_dtype=tf.int32)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/xlnet.py:253: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:686: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:693: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:797: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/testing/modeling.py:99: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From :25: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From :86: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use `tf.random.categorical` instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model()\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "from tensorflow.keras.preprocessing.sequence import pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenized = [tokenizer(text, result) for result in results]\n", + "a = list(zip(*tokenized))\n", + "len(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "# ids, segment_id, input_mask, mask_ind, perm_masks, target_mappings\n", + "\n", + "batch_x = pad_sequences(a[0],padding='post')\n", + "batch_segment = pad_sequences(a[1],padding='post', value = SEG_ID_PAD)\n", + "batch_mask = pad_sequences(a[2],padding='post', value = 1)\n", + "perm_masks = pad_sequences(a[4],padding='post')\n", + "target_mappings = pad_sequences(a[5],padding='post')" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(18, 2)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "indices = a[3]\n", + "batch_indices = np.array([np.arange(len(indices)), indices]).T\n", + "batch_indices.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(18, 43)" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch_mask.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "# self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + "# self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + "# self.perm_masks = tf.placeholder(tf.float32, [None, None, None])\n", + "# self.target_mappings = tf.placeholder(tf.float32, [None, None, None])\n", + "\n", + "samples = sess.run(model.samples, feed_dict = {model.X: batch_x,\n", + " model.input_masks: batch_mask,\n", + " model.segment_ids: batch_segment,\n", + " model.perm_masks: perm_masks,\n", + " model.target_mappings: target_mappings,\n", + " model.top_p: 0.8,\n", + " model.top_k: 100,\n", + " model.temperature: 0.8,\n", + " model.indices: batch_indices,\n", + " model.k: 5})" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "def convert_ids_to_tokens(ids):\n", + " return [sp_model.IdToPiece(i) for i in ids]" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "SAMPLE 0\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: . ly is a ▁This ly in party ▁Eventually or a s holding or ▁The ly in ▁Typically ly ▁Typically s and s s or ▁Generally that ▁These the ▁ s by extension, its people\n", + "\n", + "SAMPLE 1\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: ▁It ▁ is a ▁Zo . in party ly or a ▁concurrently holding or ▁Typically ly in s ▁upon ly s and s s or ly that ly the ▁as ▁Upon by extension, its people\n", + "\n", + "SAMPLE 2\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: ▁Typically ly is a d ▁the in party s or a ly holding or s ly in . ▁Ultimately s ly and ▁This s or ly that s the ▁Eventually s by extension, its people\n", + "\n", + "SAMPLE 3\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: , ▁Typically is a s ▁Ultimately in party ▁This or a ▁Such holding or d s in ▁The s ly s and ly ly or ▁Eventually that ▁These the ▁eventually ly by extension, its people\n", + "\n", + "SAMPLE 4\n", + "BEFORE: A politician is a person active in party politics, or a person holding or seeking office in government. Politicians propose, support and create laws or policies that govern the land and, by extension, its people\n", + "AFTER: s ly is a ▁Generally ▁Eventually in party ▁ or a s holding or . s in ly ▁This ▁The ly and ▁Ultimately s or ▁Typically that ▁Typically the s ▁These by extension, its people\n", + "\n" + ] + } + ], + "source": [ + "for i in range(samples.shape[1]):\n", + " print('SAMPLE %d'%(i))\n", + " sample_i = samples[:, i]\n", + " samples_tokens = convert_ids_to_tokens(samples[:, i].tolist())\n", + " new_splitted = splitted[:]\n", + " for no, index in enumerate(results):\n", + " new_splitted[index] = samples_tokens[no]\n", + "\n", + " new = ' '.join(new_splitted)\n", + " print('BEFORE:', text)\n", + " print('AFTER:', new)\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-classification/76.transfer-learning-xlnet-base.ipynb b/text-classification/76.transfer-learning-xlnet-base.ipynb new file mode 100644 index 0000000..14d1381 --- /dev/null +++ b/text-classification/76.transfer-learning-xlnet-base.ipynb @@ -0,0 +1 @@ +{"cells":[{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true},"cell_type":"code","source":"!wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip\n!unzip cased_L-12_H-768_A-12.zip\n!wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/utils.py\n!wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data.zip\n!wget https://raw.githubusercontent.com/zihangdai/xlnet/master/xlnet.py\n!wget https://raw.githubusercontent.com/zihangdai/xlnet/master/modeling.py\n!wget https://raw.githubusercontent.com/zihangdai/xlnet/master/prepro_utils.py\n!wget https://raw.githubusercontent.com/zihangdai/xlnet/master/model_utils.py\n!unzip data.zip","execution_count":1,"outputs":[{"output_type":"stream","text":"--2019-08-06 11:21:51-- https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip\nResolving storage.googleapis.com (storage.googleapis.com)... 108.177.127.128, 2a00:1450:4013:c07::80\nConnecting to storage.googleapis.com (storage.googleapis.com)|108.177.127.128|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 433638019 (414M) [application/zip]\nSaving to: ‘cased_L-12_H-768_A-12.zip’\n\ncased_L-12_H-768_A- 100%[===================>] 413.55M 51.0MB/s in 8.4s \n\n2019-08-06 11:22:00 (49.0 MB/s) - ‘cased_L-12_H-768_A-12.zip’ saved [433638019/433638019]\n\nArchive: cased_L-12_H-768_A-12.zip\n creating: xlnet_cased_L-12_H-768_A-12/\n inflating: xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt.index \n inflating: xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt.data-00000-of-00001 \n inflating: xlnet_cased_L-12_H-768_A-12/spiece.model \n inflating: xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt.meta \n inflating: xlnet_cased_L-12_H-768_A-12/xlnet_config.json \n--2019-08-06 11:22:07-- https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/utils.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 1815 (1.8K) [text/plain]\nSaving to: ‘utils.py’\n\nutils.py 100%[===================>] 1.77K --.-KB/s in 0s \n\n2019-08-06 11:22:07 (51.0 MB/s) - ‘utils.py’ saved [1815/1815]\n\n--2019-08-06 11:22:07-- https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data.zip\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 460176 (449K) [application/zip]\nSaving to: ‘data.zip’\n\ndata.zip 100%[===================>] 449.39K --.-KB/s in 0.02s \n\n2019-08-06 11:22:08 (21.8 MB/s) - ‘data.zip’ saved [460176/460176]\n\n--2019-08-06 11:22:08-- https://raw.githubusercontent.com/zihangdai/xlnet/master/xlnet.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 9838 (9.6K) [text/plain]\nSaving to: ‘xlnet.py’\n\nxlnet.py 100%[===================>] 9.61K --.-KB/s in 0s \n\n2019-08-06 11:22:09 (87.5 MB/s) - ‘xlnet.py’ saved [9838/9838]\n\n--2019-08-06 11:22:09-- https://raw.githubusercontent.com/zihangdai/xlnet/master/modeling.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 28460 (28K) [text/plain]\nSaving to: ‘modeling.py’\n\nmodeling.py 100%[===================>] 27.79K --.-KB/s in 0.004s \n\n2019-08-06 11:22:09 (7.51 MB/s) - ‘modeling.py’ saved [28460/28460]\n\n--2019-08-06 11:22:10-- https://raw.githubusercontent.com/zihangdai/xlnet/master/prepro_utils.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 4546 (4.4K) [text/plain]\nSaving to: ‘prepro_utils.py’\n\nprepro_utils.py 100%[===================>] 4.44K --.-KB/s in 0s \n\n2019-08-06 11:22:10 (74.8 MB/s) - ‘prepro_utils.py’ saved [4546/4546]\n\n--2019-08-06 11:22:11-- https://raw.githubusercontent.com/zihangdai/xlnet/master/model_utils.py\nResolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\nConnecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\nHTTP request sent, awaiting response... 200 OK\nLength: 14078 (14K) [text/plain]\nSaving to: ‘model_utils.py’\n\nmodel_utils.py 100%[===================>] 13.75K --.-KB/s in 0.004s \n\n2019-08-06 11:22:11 (3.68 MB/s) - ‘model_utils.py’ saved [14078/14078]\n\nArchive: data.zip\n creating: data/\n creating: data/positive/\n inflating: data/positive/positive \n inflating: data/.DS_Store \n creating: __MACOSX/\n creating: __MACOSX/data/\n inflating: __MACOSX/data/._.DS_Store \n creating: data/negative/\n inflating: data/negative/negative \n","name":"stdout"}]},{"metadata":{"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","trusted":true},"cell_type":"code","source":"!pip3 install sentencepiece","execution_count":2,"outputs":[{"output_type":"stream","text":"Requirement already satisfied: sentencepiece in /opt/conda/lib/python3.6/site-packages (0.1.82)\r\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"import xlnet\nimport numpy as np\nimport tensorflow as tf\nfrom tqdm import tqdm\nimport model_utils","execution_count":3,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"import sentencepiece as spm\nfrom prepro_utils import preprocess_text, encode_ids\n\nsp_model = spm.SentencePieceProcessor()\nsp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n\ndef tokenize_fn(text):\n text = preprocess_text(text, lower= False)\n return encode_ids(sp_model, text)","execution_count":4,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"from utils import *\n\ntrainset = sklearn.datasets.load_files(container_path = 'data', encoding = 'UTF-8')\ntrainset.data, trainset.target = separate_dataset(trainset,1.0)\nprint (trainset.target_names)\nprint (len(trainset.data))\nprint (len(trainset.target))","execution_count":5,"outputs":[{"output_type":"stream","text":"['negative', 'positive']\n10662\n10662\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"MAX_SEQ_LENGTH = 128\n\nSEG_ID_A = 0\nSEG_ID_B = 1\nSEG_ID_CLS = 2\nSEG_ID_SEP = 3\nSEG_ID_PAD = 4\n\nspecial_symbols = {\n \"\" : 0,\n \"\" : 1,\n \"\" : 2,\n \"\" : 3,\n \"\" : 4,\n \"\" : 5,\n \"\" : 6,\n \"\" : 7,\n \"\" : 8,\n}\n\nVOCAB_SIZE = 32000\nUNK_ID = special_symbols[\"\"]\nCLS_ID = special_symbols[\"\"]\nSEP_ID = special_symbols[\"\"]\nMASK_ID = special_symbols[\"\"]\nEOD_ID = special_symbols[\"\"]\n\ninput_ids, input_masks, segment_ids = [], [], []\n\nfor text in tqdm(trainset.data):\n tokens_a = tokenize_fn(text)\n if len(tokens_a) > MAX_SEQ_LENGTH - 2:\n tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n \n tokens = []\n segment_id = []\n for token in tokens_a:\n tokens.append(token)\n segment_id.append(SEG_ID_A)\n tokens.append(SEP_ID)\n segment_id.append(SEG_ID_A)\n tokens.append(CLS_ID)\n segment_id.append(SEG_ID_CLS)\n \n input_id = tokens\n input_mask = [0] * len(input_id)\n if len(input_id) < MAX_SEQ_LENGTH:\n delta_len = MAX_SEQ_LENGTH - len(input_id)\n input_id = [0] * delta_len + input_id\n input_mask = [1] * delta_len + input_mask\n segment_id = [SEG_ID_PAD] * delta_len + segment_id\n \n input_ids.append(input_id)\n input_masks.append(input_mask)\n segment_ids.append(segment_id)","execution_count":6,"outputs":[{"output_type":"stream","text":"100%|██████████| 10662/10662 [00:01<00:00, 8699.94it/s]\n","name":"stderr"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"kwargs = dict(\n is_training=True,\n use_tpu=False,\n use_bfloat16=False,\n dropout=0,\n dropatt=0,\n init='normal',\n init_range=0.1,\n init_std=0.02,\n clamp_len=-1)\n\nxlnet_parameters = xlnet.RunConfig(**kwargs)\nxlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')","execution_count":7,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"epoch = 10\nbatch_size = 10\nwarmup_proportion = 0.1\nnum_train_steps = int(len(input_ids) / batch_size * epoch)\nnum_warmup_steps = int(num_train_steps * warmup_proportion)\nprint(num_train_steps, num_warmup_steps)\n\ntraining_parameters = dict(\n decay_method = 'poly',\n train_steps = num_train_steps,\n learning_rate = 2e-5,\n warmup_steps = num_warmup_steps,\n min_lr_ratio = 0.0,\n weight_decay = 0.00,\n adam_epsilon = 1e-8,\n num_core_per_host = 1,\n lr_layer_decay_rate = 1,\n use_tpu=False,\n use_bfloat16=False,\n dropout=0.0,\n dropatt=0.0,\n init='normal',\n init_range=0.1,\n init_std=0.02,\n clip = 1.0,\n clamp_len=-1,)","execution_count":8,"outputs":[{"output_type":"stream","text":"10662 1066\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"class Parameter:\n def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n min_lr_ratio, clip, **kwargs):\n self.decay_method = decay_method\n self.warmup_steps = warmup_steps\n self.weight_decay = weight_decay\n self.adam_epsilon = adam_epsilon\n self.num_core_per_host = num_core_per_host\n self.lr_layer_decay_rate = lr_layer_decay_rate\n self.use_tpu = use_tpu\n self.learning_rate = learning_rate\n self.train_steps = train_steps\n self.min_lr_ratio = min_lr_ratio\n self.clip = clip\n \ntraining_parameters = Parameter(**training_parameters)","execution_count":9,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"class Model:\n def __init__(\n self,\n dimension_output,\n learning_rate = 2e-5,\n ):\n self.X = tf.placeholder(tf.int32, [None, None])\n self.segment_ids = tf.placeholder(tf.int32, [None, None])\n self.input_masks = tf.placeholder(tf.float32, [None, None])\n self.Y = tf.placeholder(tf.int32, [None])\n \n xlnet_model = xlnet.XLNetModel(\n xlnet_config=xlnet_config,\n run_config=xlnet_parameters,\n input_ids=tf.transpose(self.X, [1, 0]),\n seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n input_mask=tf.transpose(self.input_masks, [1, 0]))\n \n summary = xlnet_model.get_pooled_out(\"last\", True)\n print(summary)\n \n self.logits = tf.layers.dense(summary, dimension_output)\n \n self.cost = tf.reduce_mean(\n tf.nn.sparse_softmax_cross_entropy_with_logits(\n logits = self.logits, labels = self.Y\n )\n )\n \n self.optimizer, self.learning_rate, _ = model_utils.get_train_op(training_parameters, self.cost)\n \n correct_pred = tf.equal(\n tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n )\n self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))","execution_count":10,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"dimension_output = 2\nlearning_rate = 2e-5\n\ntf.reset_default_graph()\nsess = tf.InteractiveSession()\nmodel = Model(\n dimension_output,\n learning_rate\n)\n\nsess.run(tf.global_variables_initializer())","execution_count":11,"outputs":[{"output_type":"stream","text":"Tensor(\"model_1/sequnece_summary/summary/Tanh:0\", shape=(?, 768), dtype=float32)\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"import collections\nimport re\n\ndef get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n assignment_map = {}\n initialized_variable_names = {}\n\n name_to_variable = collections.OrderedDict()\n for var in tvars:\n name = var.name\n m = re.match('^(.*):\\\\d+$', name)\n if m is not None:\n name = m.group(1)\n name_to_variable[name] = var\n\n init_vars = tf.train.list_variables(init_checkpoint)\n\n assignment_map = collections.OrderedDict()\n for x in init_vars:\n (name, var) = (x[0], x[1])\n if name not in name_to_variable:\n continue\n assignment_map[name] = name_to_variable[name]\n initialized_variable_names[name] = 1\n initialized_variable_names[name + ':0'] = 1\n\n return (assignment_map, initialized_variable_names)","execution_count":12,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"tvars = tf.trainable_variables()\ncheckpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\nassignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n checkpoint)","execution_count":13,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"saver = tf.train.Saver(var_list = assignment_map)\nsaver.restore(sess, checkpoint)","execution_count":14,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn.model_selection import train_test_split\n\ntrain_input_ids, test_input_ids, train_input_masks, test_input_masks, train_segment_ids, test_segment_ids, train_Y, test_Y = train_test_split(\n input_ids, input_masks, segment_ids, trainset.target, test_size = 0.2\n)","execution_count":15,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"from tqdm import tqdm\nimport time\n\nEARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 3, 0, 0, 0\n\nwhile True:\n lasttime = time.time()\n if CURRENT_CHECKPOINT == EARLY_STOPPING:\n print('break epoch:%d\\n' % (EPOCH))\n break\n\n train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0\n pbar = tqdm(\n range(0, len(train_input_ids), batch_size), desc = 'train minibatch loop'\n )\n for i in pbar:\n index = min(i + batch_size, len(train_input_ids))\n batch_x = train_input_ids[i: index]\n batch_masks = train_input_masks[i: index]\n batch_segment = train_segment_ids[i: index]\n batch_y = train_Y[i: index]\n acc, cost, _ = sess.run(\n [model.accuracy, model.cost, model.optimizer],\n feed_dict = {\n model.Y: batch_y,\n model.X: batch_x,\n model.segment_ids: batch_segment,\n model.input_masks: batch_masks\n },\n )\n assert not np.isnan(cost)\n train_loss += cost\n train_acc += acc\n pbar.set_postfix(cost = cost, accuracy = acc)\n pbar = tqdm(range(0, len(test_input_ids), batch_size), desc = 'test minibatch loop')\n for i in pbar:\n index = min(i + batch_size, len(test_input_ids))\n batch_x = test_input_ids[i: index]\n batch_masks = test_input_masks[i: index]\n batch_segment = test_segment_ids[i: index]\n batch_y = test_Y[i: index]\n acc, cost = sess.run(\n [model.accuracy, model.cost],\n feed_dict = {\n model.Y: batch_y,\n model.X: batch_x,\n model.segment_ids: batch_segment,\n model.input_masks: batch_masks\n },\n )\n test_loss += cost\n test_acc += acc\n pbar.set_postfix(cost = cost, accuracy = acc)\n\n train_loss /= len(train_input_ids) / batch_size\n train_acc /= len(train_input_ids) / batch_size\n test_loss /= len(test_input_ids) / batch_size\n test_acc /= len(test_input_ids) / batch_size\n\n if test_acc > CURRENT_ACC:\n print(\n 'epoch: %d, pass acc: %f, current acc: %f'\n % (EPOCH, CURRENT_ACC, test_acc)\n )\n CURRENT_ACC = test_acc\n CURRENT_CHECKPOINT = 0\n else:\n CURRENT_CHECKPOINT += 1\n \n print('time taken:', time.time() - lasttime)\n print(\n 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n )\n EPOCH += 1","execution_count":16,"outputs":[{"output_type":"stream","text":"train minibatch loop: 100%|██████████| 853/853 [03:45<00:00, 4.00it/s, accuracy=0.889, cost=0.34]\ntest minibatch loop: 100%|██████████| 214/214 [00:21<00:00, 10.01it/s, accuracy=0.667, cost=0.335]\ntrain minibatch loop: 0%| | 0/853 [00:00 MAX_SEQ_LENGTH - 2:\n", + " tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + " padding = [0] * (MAX_SEQ_LENGTH - len(input_id))\n", + " input_id += padding\n", + " input_mask += padding\n", + " segment_id += padding\n", + " \n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:116: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "albert_config = modeling.AlbertConfig.from_json_file('assets/albert_config.json')\n", + "albert_config" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['▁moving', '▁uneven', '▁success']" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tokenizer.tokenize(trainset.data[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "epoch = 10\n", + "batch_size = 32\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(input_ids) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " model = modeling.AlbertModel(\n", + " config=albert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " self.logits = tf.layers.dense(output_layer, dimension_output)\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:194: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:507: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:588: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:1025: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/modeling.py:253: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/optimization.py:36: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/optimization.py:41: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py:409: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Deprecated in favor of operator or tf.math.divide.\n", + "INFO:tensorflow:++++++ warmup starts at step 0, for 333 steps ++++++\n", + "INFO:tensorflow:using adamw\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/albert/optimization.py:101: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1205: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "learning_rate = 5e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + " learning_rate\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ,\n", + " ]" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "var_lists" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n", + "INFO:tensorflow:Restoring parameters from variables/variables\n" + ] + } + ], + "source": [ + "saver.restore(sess, 'variables/variables')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "train_input_ids, test_input_ids, train_input_masks, test_input_masks, train_segment_ids, test_segment_ids, train_Y, test_Y = train_test_split(\n", + " input_ids, input_masks, segment_ids, trainset.target, test_size = 0.2\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 267/267 [00:59<00:00, 4.46it/s, accuracy=0.765, cost=0.51] \n", + "test minibatch loop: 100%|██████████| 67/67 [00:05<00:00, 12.26it/s, accuracy=0.667, cost=0.721]\n", + "train minibatch loop: 0%| | 0/267 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-classification/78.electra-base.ipynb b/text-classification/78.electra-base.ipynb new file mode 100644 index 0000000..f614f34 --- /dev/null +++ b/text-classification/78.electra-base.ipynb @@ -0,0 +1,611 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Make sure run this notebook in electra repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/google-research/electra.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "checkpoint\t\t\t electra_base.index vocab.txt\r\n", + "electra_base.data-00000-of-00001 electra_base.meta\r\n" + ] + } + ], + "source": [ + "# !wget https://storage.googleapis.com/electra-data/electra_base.zip\n", + "# !unzip electra_base.zip\n", + "!ls electra_base/" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'vocab_size': 30522,\n", + " 'hidden_size': 768,\n", + " 'num_hidden_layers': 12,\n", + " 'num_attention_heads': 12,\n", + " 'hidden_act': 'gelu',\n", + " 'intermediate_size': 3072,\n", + " 'hidden_dropout_prob': 0.1,\n", + " 'attention_probs_dropout_prob': 0.1,\n", + " 'max_position_embeddings': 512,\n", + " 'type_vocab_size': 2,\n", + " 'initializer_range': 0.02}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import configure_finetuning\n", + "from util import training_utils\n", + "\n", + "hparams = {'model_size': 'base', 'vocab_size': 30522}\n", + "config = configure_finetuning.FinetuningConfig('electra-base', './electra_base/', **hparams)\n", + "bert_config = training_utils.get_bert_config(config)\n", + "\n", + "bert_config.__dict__" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from model import modeling\n", + "from model import optimization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/utils.py\n", + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data.zip\n", + "# !unzip data.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import *\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['negative', 'positive']\n", + "10662\n", + "10662\n" + ] + } + ], + "source": [ + "trainset = sklearn.datasets.load_files(container_path = 'data', encoding = 'UTF-8')\n", + "trainset.data, trainset.target = separate_dataset(trainset,1.0)\n", + "print (trainset.target_names)\n", + "print (len(trainset.data))\n", + "print (len(trainset.target))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SEQ_LENGTH = 100" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "from model import tokenization\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file='electra_base/vocab.txt',\n", + " do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 10662/10662 [00:02<00:00, 3926.33it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "input_ids, input_masks, segment_ids = [], [], []\n", + "\n", + "for text in tqdm(trainset.data):\n", + " tokens_a = tokenizer.tokenize(text.lower())\n", + " if len(tokens_a) > MAX_SEQ_LENGTH - 2:\n", + " tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + " padding = [0] * (MAX_SEQ_LENGTH - len(input_id))\n", + " input_id += padding\n", + " input_mask += padding\n", + " segment_id += padding\n", + " \n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from finetune import task_builder" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "config" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'chunk'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tasks = task_builder.get_tasks(config)\n", + "tasks[0].name" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 32\n", + "epoch = 10\n", + "num_train_steps = int(len(input_ids) / batch_size * epoch)\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " model = modeling.BertModel(\n", + " bert_config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " \n", + " with tf.variable_scope(\"task_specific/classify\"):\n", + " self.logits = tf.layers.dense(output_layer, dimension_output)\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " self.optimizer = optimization.create_optimizer(\n", + " self.cost, config.learning_rate, num_train_steps,\n", + " weight_decay_rate=config.weight_decay_rate,\n", + " use_tpu=config.use_tpu,\n", + " warmup_proportion=config.warmup_proportion,\n", + " layerwise_lr_decay_power=config.layerwise_lr_decay,\n", + " n_transformer_layers=bert_config.num_hidden_layers\n", + " )\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/nlp-english/electra/model/modeling.py:698: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from electra_base/electra_base\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'electra')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, 'electra_base/electra_base')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "train_input_ids, test_input_ids, train_input_masks, test_input_masks, train_segment_ids, test_segment_ids, train_Y, test_Y = train_test_split(\n", + " input_ids, input_masks, segment_ids, trainset.target, test_size = 0.2\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 267/267 [01:05<00:00, 4.09it/s, accuracy=0.941, cost=0.192]\n", + "test minibatch loop: 100%|██████████| 67/67 [00:04<00:00, 13.89it/s, accuracy=0.857, cost=0.4] \n", + "train minibatch loop: 0%| | 0/267 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-classification/79.electra-large.ipynb b/text-classification/79.electra-large.ipynb new file mode 100644 index 0000000..b2a8813 --- /dev/null +++ b/text-classification/79.electra-large.ipynb @@ -0,0 +1,492 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Make sure run this notebook in electra repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/google-research/electra.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "checkpoint\t\t\t electra_large.index\tvocab.txt\r\n", + "electra_large.data-00000-of-00001 electra_large.meta\r\n" + ] + } + ], + "source": [ + "# !wget https://storage.googleapis.com/electra-data/electra_large.zip\n", + "# !unzip electra_large.zip\n", + "!ls electra_large/" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'vocab_size': 30522,\n", + " 'hidden_size': 1024,\n", + " 'num_hidden_layers': 24,\n", + " 'num_attention_heads': 16,\n", + " 'hidden_act': 'gelu',\n", + " 'intermediate_size': 4096,\n", + " 'hidden_dropout_prob': 0.1,\n", + " 'attention_probs_dropout_prob': 0.1,\n", + " 'max_position_embeddings': 512,\n", + " 'type_vocab_size': 2,\n", + " 'initializer_range': 0.02}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import configure_finetuning\n", + "from util import training_utils\n", + "\n", + "hparams = {'model_size': 'large', 'vocab_size': 30522}\n", + "config = configure_finetuning.FinetuningConfig('electra-large', './', **hparams)\n", + "bert_config = training_utils.get_bert_config(config)\n", + "\n", + "bert_config.__dict__" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from model import modeling\n", + "from model import optimization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/utils.py\n", + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data.zip\n", + "# !unzip data.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import *\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['negative', 'positive']\n", + "10662\n", + "10662\n" + ] + } + ], + "source": [ + "trainset = sklearn.datasets.load_files(container_path = 'data', encoding = 'UTF-8')\n", + "trainset.data, trainset.target = separate_dataset(trainset,1.0)\n", + "print (trainset.target_names)\n", + "print (len(trainset.data))\n", + "print (len(trainset.target))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SEQ_LENGTH = 100" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "from model import tokenization\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file='electra_base/vocab.txt',\n", + " do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 10662/10662 [00:02<00:00, 3885.42it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "input_ids, input_masks, segment_ids = [], [], []\n", + "\n", + "for text in tqdm(trainset.data):\n", + " tokens_a = tokenizer.tokenize(text.lower())\n", + " if len(tokens_a) > MAX_SEQ_LENGTH - 2:\n", + " tokens_a = tokens_a[:(MAX_SEQ_LENGTH - 2)]\n", + " tokens = [\"[CLS]\"] + tokens_a + [\"[SEP]\"]\n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + " padding = [0] * (MAX_SEQ_LENGTH - len(input_id))\n", + " input_id += padding\n", + " input_mask += padding\n", + " segment_id += padding\n", + " \n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 32\n", + "epoch = 10\n", + "num_train_steps = int(len(input_ids) / batch_size * epoch)\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " model = modeling.BertModel(\n", + " bert_config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " with tf.variable_scope(\"task_specific/classify\"):\n", + " self.logits = tf.layers.dense(output_layer, dimension_output)\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " self.optimizer = optimization.create_optimizer(\n", + " self.cost, config.learning_rate, num_train_steps,\n", + " weight_decay_rate=config.weight_decay_rate,\n", + " use_tpu=config.use_tpu,\n", + " warmup_proportion=config.warmup_proportion,\n", + " layerwise_lr_decay_power=config.layerwise_lr_decay,\n", + " n_transformer_layers=bert_config.num_hidden_layers\n", + " )\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/nlp-english/electra/model/modeling.py:698: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from electra_large/electra_large\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'electra')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, 'electra_large/electra_large')" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "train_input_ids, test_input_ids, train_input_masks, test_input_masks, train_segment_ids, test_segment_ids, train_Y, test_Y = train_test_split(\n", + " input_ids, input_masks, segment_ids, trainset.target, test_size = 0.2\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 267/267 [03:19<00:00, 1.34it/s, accuracy=0.824, cost=0.335]\n", + "test minibatch loop: 100%|██████████| 67/67 [00:14<00:00, 4.73it/s, accuracy=0.81, cost=0.379] \n", + "train minibatch loop: 0%| | 0/267 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-classification/README.md b/text-classification/README.md index 310dcfc..4ee1aca 100644 --- a/text-classification/README.md +++ b/text-classification/README.md @@ -6,7 +6,7 @@ ## Score and average time taken per epoch, not sorted -Based on 20% validation. The results will be different on different dataset. Trained on a GTX 960, 4GB VRAM. +Based on 20% validation, time taken based on single Tesla V100 32GB VRAM. | name | accuracy | time taken (s) | |--------------------------------------|----------|----------------| @@ -81,7 +81,11 @@ Based on 20% validation. The results will be different on different dataset. Tra | 69. slice-gru-bahdanau | 0.70 | 20.247409 | | 70. wavenet | 0.59 | 101.293274 | | 71. transfer-learning-bert | 0.81 | 887.590460 | -| 72. transfer-learning-xlnet | 0.846 | 340.7679 | +| 72. transfer-learning-xlnet-large | 0.846 | 340.7679 | | 73. lstm-birnn-max-avg | 0.7552 | 9.35624 | | 74. transfer-learning-bert-base-6 | 0.7655 | 494.169 | | 75. transfer-learning-bert-large-12 | 0.80 | 1365.30 | +| 76. transfer-learning-xlnet-base | 0.820441 | 240.262 | +| 77. transfer-learning-albert-base | 0.799053 | 61.8179 | +| 78. transfer-learning-electra-base | 0.836336 | 66.0257 | +| 79. transfer-learning-electra-large | 0.875248 | 195.37280 | diff --git a/text-similarity/1.birnn-contrastive.ipynb b/text-similarity/1.birnn-contrastive.ipynb index 588cbac..3e0479c 100644 --- a/text-similarity/1.birnn-contrastive.ipynb +++ b/text-similarity/1.birnn-contrastive.ipynb @@ -6,32 +6,18 @@ "metadata": {}, "outputs": [], "source": [ - "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/jupyter/.local/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", - " \"This module will be removed in 0.20.\", DeprecationWarning)\n" - ] - } - ], + "outputs": [], "source": [ "import tensorflow as tf\n", - "import re\n", - "import numpy as np\n", - "import pandas as pd\n", - "from tqdm import tqdm\n", - "import collections\n", - "from unidecode import unidecode\n", - "from sklearn.cross_validation import train_test_split" + "import json" ] }, { @@ -40,224 +26,14 @@ "metadata": {}, "outputs": [], "source": [ - "def build_dataset(words, n_words):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " count.extend(collections.Counter(words).most_common(n_words - 1))\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary\n", - "\n", - "def str_idx(corpus, dic, maxlen, UNK=3):\n", - " X = np.zeros((len(corpus),maxlen))\n", - " for i in range(len(corpus)):\n", - " for no, k in enumerate(corpus[i][:maxlen][::-1]):\n", - " val = dic[k] if k in dic else UNK\n", - " X[i,-1 - no]= val\n", - " return X\n", - "\n", - "def cleaning(string):\n", - " string = unidecode(string).replace('.', ' . ').replace(',', ' , ')\n", - " string = re.sub('[^A-Za-z\\- ]+', ' ', string)\n", - " string = re.sub(r'[ ]+', ' ', string).strip()\n", - " return string.lower()" + "with open('contrastive.json') as fopen:\n", + " data = json.load(fopen)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "
" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([0, 1]), array([255024, 149263]))" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.unique(label, return_counts = True)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [00:07<00:00, 54874.65it/s]\n" - ] - } - ], - "source": [ - "for i in tqdm(range(len(left))):\n", - " left[i] = cleaning(left[i])\n", - " right[i] = cleaning(right[i])" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 87661\n", - "Most common words [('the', 377593), ('what', 324635), ('is', 269934), ('i', 223893), ('how', 220876), ('a', 212757)]\n", - "Sample data [5, 6, 4, 1285, 62, 1285, 2501, 10, 564, 11] ['what', 'is', 'the', 'step', 'by', 'step', 'guide', 'to', 'invest', 'in']\n" - ] - } - ], - "source": [ - "concat = ' '.join(left + right).split()\n", - "vocabulary_size = len(list(set(concat)))\n", - "data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size)\n", - "print('vocab from size: %d'%(vocabulary_size))\n", - "print('Most common words', count[4:10])\n", - "print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, "outputs": [], "source": [ "class Model:\n", @@ -313,106 +89,114 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "size_layer = 256\n", "num_layers = 2\n", - "embedded_size = 128\n", + "embedded_size = 256\n", "learning_rate = 1e-3\n", - "maxlen = 50\n", "batch_size = 128\n", - "dropout = 0.8" + "dropout = 1.0\n", + "vocab_size = 30000" ] }, { "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "\n", - "vectors_left = str_idx(left, dictionary, maxlen)\n", - "vectors_right = str_idx(right, dictionary, maxlen)\n", - "train_X_left, test_X_left, train_X_right, test_X_right, train_Y, test_Y = train_test_split(vectors_left,\n", - " vectors_right,\n", - " label,\n", - " test_size = 0.2)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, + "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Colocations handled automatically by placer.\n", - "WARNING:tensorflow:From :6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From :6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", - "\n", - "WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", "For more information, please see:\n", " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", "If you depend on functionality not listed there, please file an issue.\n", "\n", - "WARNING:tensorflow:From :17: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From :17: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py:443: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py:1259: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", - "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", - "WARNING:tensorflow:From :37: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :37: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "keep_dims is deprecated, use keepdims instead\n", - "WARNING:tensorflow:From :41: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From :41: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Deprecated in favor of operator or tf.math.divide.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", - "Use tf.cast instead.\n" + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" ] } ], "source": [ "tf.reset_default_graph()\n", "sess = tf.InteractiveSession()\n", - "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate,dropout)\n", + "model = Model(size_layer,num_layers,embedded_size,vocab_size,learning_rate,dropout)\n", "sess.run(tf.global_variables_initializer())" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "train_X_left = data['left_train']\n", + "train_X_right = data['right_train']\n", + "train_Y = data['label_train']\n", + "test_X_left = data['left_test']\n", + "test_X_right = data['right_test']\n", + "test_Y = data['label_test']" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [12:54<00:00, 3.32it/s, accuracy=0.762, cost=0.0892]\n", - "test minibatch loop: 100%|██████████| 632/632 [01:30<00:00, 7.02it/s, accuracy=0.611, cost=0.114] \n", - "train minibatch loop: 0%| | 0/2527 [00:00 CURRENT_ACC:\n", " print(\n", @@ -708,29 +334,6 @@ " test_acc))" ] }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[array([0.], dtype=float32), array([0.13218915], dtype=float32)]" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "left = str_idx(['a person is outdoors, on a horse.'], dictionary, maxlen)\n", - "right = str_idx(['a person on a horse jumps over a broken down airplane.'], dictionary, maxlen)\n", - "sess.run([model.temp_sim,1-model.distance], feed_dict = {model.X_left : left, \n", - " model.X_right: right})" - ] - }, { "cell_type": "code", "execution_count": null, diff --git a/text-similarity/10.xlnet-base-circle-loss.ipynb b/text-similarity/10.xlnet-base-circle-loss.ipynb new file mode 100644 index 0000000..6c308bf --- /dev/null +++ b/text-similarity/10.xlnet-base-circle-loss.ipynb @@ -0,0 +1,845 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Make sure run this notebook in xlnet repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/zihangdai/xlnet.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n", + "# !unzip xlnet.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import sentencepiece as spm\n", + "from prepro_utils import preprocess_text, encode_ids\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "\n", + "def tokenize_fn(text):\n", + " text = preprocess_text(text, lower= False)\n", + " return encode_ids(sp_model, text)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " \"\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('../text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from tqdm import tqdm\n", + "\n", + "MAX_SEQ_LENGTH = 120\n", + "\n", + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + " \n", + "def get_data(left, right):\n", + " input_ids, input_mask, all_seg_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens = tokenize_fn(left[i])\n", + " tokens_right = tokenize_fn(right[i])\n", + " \n", + " _truncate_seq_pair(tokens, tokens_right, MAX_SEQ_LENGTH - 3)\n", + " segment_ids = [SEG_ID_A] * len(tokens)\n", + " tokens.append(SEP_ID)\n", + " segment_ids.append(SEG_ID_A)\n", + "\n", + " tokens.extend(tokens_right)\n", + " segment_ids.extend([SEG_ID_B] * len(tokens_right))\n", + " tokens.append(SEP_ID)\n", + " segment_ids.append(SEG_ID_B)\n", + "\n", + " tokens.append(CLS_ID)\n", + " segment_ids.append(SEG_ID_CLS)\n", + "\n", + " cur_input_ids = tokens\n", + " cur_input_mask = [0] * len(cur_input_ids)\n", + " assert len(tokens) == len(cur_input_mask)\n", + " assert len(tokens) == len(segment_ids)\n", + " input_ids.append(tokens)\n", + " input_mask.append(cur_input_mask)\n", + " all_seg_ids.append(segment_ids)\n", + " return input_ids, input_mask, all_seg_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [01:14<00:00, 3504.24it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:03<00:00, 4423.89it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:63: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "import xlnet\n", + "import model_utils\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.1,\n", + " dropatt=0.1,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "245439 24543\n" + ] + } + ], + "source": [ + "epoch = 15\n", + "batch_size = 16\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(train_ids) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)\n", + "print(num_train_steps, num_warmup_steps)\n", + "\n", + "training_parameters = dict(\n", + " decay_method = 'poly',\n", + " train_steps = num_train_steps,\n", + " learning_rate = 2e-5,\n", + " warmup_steps = num_warmup_steps,\n", + " min_lr_ratio = 0.0,\n", + " weight_decay = 0.00,\n", + " adam_epsilon = 1e-8,\n", + " num_core_per_host = 1,\n", + " lr_layer_decay_rate = 1,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.02,\n", + " clip = 1.0,\n", + " clamp_len=-1,)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "class Parameter:\n", + " def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n", + " num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n", + " min_lr_ratio, clip, **kwargs):\n", + " self.decay_method = decay_method\n", + " self.warmup_steps = warmup_steps\n", + " self.weight_decay = weight_decay\n", + " self.adam_epsilon = adam_epsilon\n", + " self.num_core_per_host = num_core_per_host\n", + " self.lr_layer_decay_rate = lr_layer_decay_rate\n", + " self.use_tpu = use_tpu\n", + " self.learning_rate = learning_rate\n", + " self.train_steps = train_steps\n", + " self.min_lr_ratio = min_lr_ratio\n", + " self.clip = clip\n", + " \n", + "training_parameters = Parameter(**training_parameters)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "768" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "xlnet_config.d_model" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=tf.transpose(self.X, [1, 0]),\n", + " seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n", + " input_mask=tf.transpose(self.input_masks, [1, 0]))\n", + " \n", + " summary = xlnet_model.get_pooled_out(\"last\", True)\n", + " self.out = tf.layers.dense(summary, xlnet_config.d_model)\n", + " self.out = tf.nn.l2_normalize(self.out, 1)\n", + " self.logits = tf.layers.dense(self.out,dimension_output,use_bias=False,\n", + " kernel_constraint=tf.keras.constraints.unit_norm())\n", + " \n", + " self.gamma = 64\n", + " self.margin = 0.25\n", + " self.O_p = 1 + self.margin\n", + " self.O_n = -self.margin\n", + " self.Delta_p = 1 - self.margin\n", + " self.Delta_n = self.margin\n", + " \n", + " self.batch_idxs = tf.expand_dims(\n", + " tf.range(0, self.batch_size, dtype=tf.int32), 1) # shape [batch,1]\n", + " idxs = tf.concat([self.batch_idxs, tf.cast(self.Y, tf.int32)], 1)\n", + " sp = tf.expand_dims(tf.gather_nd(self.logits, idxs), 1)\n", + " mask = tf.logical_not(\n", + " tf.scatter_nd(idxs, tf.ones(tf.shape(idxs)[0], tf.bool),\n", + " tf.shape(self.logits)))\n", + "\n", + " sn = tf.reshape(tf.boolean_mask(self.logits, mask), (self.batch_size, -1))\n", + "\n", + " alpha_p = tf.nn.relu(self.O_p - tf.stop_gradient(sp))\n", + " alpha_n = tf.nn.relu(tf.stop_gradient(sn) - self.O_n)\n", + "\n", + " r_sp_m = alpha_p * (sp - self.Delta_p)\n", + " r_sn_m = alpha_n * (sn - self.Delta_n)\n", + " _Z = tf.concat([r_sn_m, r_sp_m], 1)\n", + " _Z = _Z * self.gamma\n", + " # sum all similarity\n", + " logZ = tf.math.reduce_logsumexp(_Z, 1, keepdims=True)\n", + " # remove sn_p from all sum similarity\n", + " self.cost = -r_sp_m * self.gamma + logZ\n", + " self.cost = tf.reduce_mean(self.cost[:,0])\n", + " \n", + " self.optimizer, self.learning_rate, _ = model_utils.get_train_op(training_parameters, self.cost)\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y[:,0]\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:220: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:220: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:453: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:535: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:96: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:108: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:131: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "learning_rate = 2e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(dimension_output, learning_rate)\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_ids[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_y = train_Y[:5]\n", + "batch_y = np.expand_dims(batch_y,1)\n", + "batch_segments = segment_train[:5]\n", + "batch_segments = pad_sequences(batch_segments, padding='post', value = 4)\n", + "batch_masks = train_masks[:5]\n", + "batch_masks = pad_sequences(batch_masks, padding='post', value = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.2, 55.720844]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.cost],\n", + " feed_dict = {model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks})" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [52:21<00:00, 5.21it/s, accuracy=1, cost=0.0359] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:54<00:00, 15.41it/s, accuracy=1, cost=0.00785] \n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:11, 5.78it/s, accuracy=1, cost=0.482]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 0, pass acc: 0.000000, current acc: 0.914827\n", + "time taken: 3195.6359593868256\n", + "epoch: 0, training loss: 10.497539, training acc: 0.848343, valid loss: 5.863732, valid acc: 0.914827\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [52:02<00:00, 5.24it/s, accuracy=1, cost=0.00451] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:53<00:00, 15.59it/s, accuracy=1, cost=0.00506] \n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:39, 5.72it/s, accuracy=0.938, cost=4.09]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1, pass acc: 0.914827, current acc: 0.927431\n", + "time taken: 3176.4944434165955\n", + "epoch: 1, training loss: 5.274645, training acc: 0.924659, valid loss: 5.358865, valid acc: 0.927431\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [52:19<00:00, 5.21it/s, accuracy=1, cost=0.00229] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:52<00:00, 15.87it/s, accuracy=1, cost=0.00256] \n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:11, 5.78it/s, accuracy=0.938, cost=4.94]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 2, pass acc: 0.927431, current acc: 0.936083\n", + "time taken: 3192.366697072983\n", + "epoch: 2, training loss: 3.858834, training acc: 0.948344, valid loss: 5.224887, valid acc: 0.936083\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [50:28<00:00, 5.40it/s, accuracy=1, cost=0.00073] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:52<00:00, 15.98it/s, accuracy=1, cost=0.000478]\n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:44, 5.71it/s, accuracy=1, cost=0.000715]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 3, pass acc: 0.936083, current acc: 0.937425\n", + "time taken: 3081.159793138504\n", + "epoch: 3, training loss: 2.776629, training acc: 0.964443, valid loss: 5.418724, valid acc: 0.937425\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [49:28<00:00, 5.51it/s, accuracy=1, cost=0.000696]\n", + "test minibatch loop: 100%|██████████| 838/838 [00:52<00:00, 16.08it/s, accuracy=1, cost=0.00111] \n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:49, 5.70it/s, accuracy=1, cost=0.0011]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 4, pass acc: 0.937425, current acc: 0.937724\n", + "time taken: 3020.8846673965454\n", + "epoch: 4, training loss: 1.990225, training acc: 0.975364, valid loss: 5.434131, valid acc: 0.937724\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [52:20<00:00, 5.21it/s, accuracy=1, cost=0.00395] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:52<00:00, 15.88it/s, accuracy=1, cost=0.00252] \n", + "train minibatch loop: 0%| | 1/16363 [00:00<47:00, 5.80it/s, accuracy=1, cost=0.00254]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 5, pass acc: 0.937724, current acc: 0.940334\n", + "time taken: 3192.947704553604\n", + "epoch: 5, training loss: 1.449264, training acc: 0.982793, valid loss: 5.935048, valid acc: 0.940334\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 16363/16363 [50:09<00:00, 5.44it/s, accuracy=1, cost=0.00803] \n", + "test minibatch loop: 100%|██████████| 838/838 [00:53<00:00, 15.68it/s, accuracy=1, cost=0.00222] " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "time taken: 3063.1185958385468\n", + "epoch: 6, training loss: 1.140489, training acc: 0.987418, valid loss: 6.756607, valid acc: 0.939887\n", + "\n", + "break epoch:7\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "import time\n", + "\n", + "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 1, 0, 0, 0\n", + "\n", + "while True:\n", + " lasttime = time.time()\n", + " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", + " print('break epoch:%d\\n' % (EPOCH))\n", + " break\n", + " train_acc, train_loss = [], []\n", + " test_acc, test_loss = [], []\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(train_ids), batch_size), desc = 'train minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_ids))\n", + " batch_x = train_ids[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_y = train_Y[i: index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " batch_segments = segment_train[i: index]\n", + " batch_segments = pad_sequences(batch_segments, padding='post', value = 4)\n", + " batch_masks = train_masks[i: index]\n", + " batch_masks = pad_sequences(batch_masks, padding='post', value = 1)\n", + " acc, cost, _ = sess.run(\n", + " [model.accuracy, model.cost, model.optimizer],\n", + " feed_dict = {\n", + " model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks\n", + " },\n", + " )\n", + " train_loss.append(cost)\n", + " train_acc.append(acc)\n", + " pbar.set_postfix(cost = cost, accuracy = acc)\n", + " \n", + " pbar = tqdm(\n", + " range(0, len(test_ids), batch_size), desc = 'test minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_ids))\n", + " batch_x = test_ids[i: index]\n", + " batch_x = pad_sequences(batch_x,padding='post')\n", + " batch_y = test_Y[i: index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " batch_segments = segment_test[i: index]\n", + " batch_segments = pad_sequences(batch_segments, padding='post', value = 4)\n", + " batch_masks = test_masks[i: index]\n", + " batch_masks = pad_sequences(batch_masks, padding='post', value = 1)\n", + " \n", + " acc, cost = sess.run(\n", + " [model.accuracy, model.cost],\n", + " feed_dict = {\n", + " model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks\n", + " },\n", + " )\n", + " test_loss.append(cost)\n", + " test_acc.append(acc)\n", + " pbar.set_postfix(cost = cost, accuracy = acc)\n", + " \n", + " train_loss = np.mean(train_loss)\n", + " train_acc = np.mean(train_acc)\n", + " test_loss = np.mean(test_loss)\n", + " test_acc = np.mean(test_acc)\n", + " \n", + " if test_acc > CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/2.birnn-cross-entropy.ipynb b/text-similarity/2.birnn-cross-entropy.ipynb new file mode 100644 index 0000000..c23b47f --- /dev/null +++ b/text-similarity/2.birnn-cross-entropy.ipynb @@ -0,0 +1,346 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('pair.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(self, size_layer, num_layers, embedded_size,\n", + " dict_size, learning_rate, dropout):\n", + " \n", + " def cells(size, reuse=False):\n", + " cell = tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " return tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=dropout)\n", + " \n", + " def birnn(inputs, scope):\n", + " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = inputs,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " inputs = tf.concat((out_fw, out_bw), 2)\n", + " return inputs[:,-1]\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", + " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " \n", + " self.out = birnn(embedded_left, 'left')\n", + " self.logits = tf.layers.dense(self.out, 2)\n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 256\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "dropout = 1.0\n", + "vocab_size = 30000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :17: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :28: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,num_layers,embedded_size,vocab_size,learning_rate,dropout)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['left_train', 'label_train', 'left_test', 'label_test'])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X_left = data['left_train']\n", + "train_Y = data['label_train']\n", + "test_X_left = data['left_test']\n", + "test_Y = data['label_test']" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 2046/2046 [10:43<00:00, 3.18it/s, accuracy=0.714, cost=0.563]\n", + "test minibatch loop: 100%|██████████| 105/105 [00:12<00:00, 8.71it/s, accuracy=0.747, cost=0.492]\n", + "train minibatch loop: 0%| | 0/2046 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time()-lasttime)\n", + " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", + " train_acc,test_loss,\n", + " test_acc))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/2.dilated-cnn-contrastive.ipynb b/text-similarity/2.dilated-cnn-contrastive.ipynb deleted file mode 100644 index d7276c3..0000000 --- a/text-similarity/2.dilated-cnn-contrastive.ipynb +++ /dev/null @@ -1,648 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/jupyter/.local/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", - " \"This module will be removed in 0.20.\", DeprecationWarning)\n" - ] - } - ], - "source": [ - "import tensorflow as tf\n", - "import re\n", - "import numpy as np\n", - "import pandas as pd\n", - "from tqdm import tqdm\n", - "import collections\n", - "from unidecode import unidecode\n", - "from sklearn.cross_validation import train_test_split" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " count.extend(collections.Counter(words).most_common(n_words - 1))\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary\n", - "\n", - "def str_idx(corpus, dic, maxlen, UNK=3):\n", - " X = np.zeros((len(corpus),maxlen))\n", - " for i in range(len(corpus)):\n", - " for no, k in enumerate(corpus[i][:maxlen][::-1]):\n", - " val = dic[k] if k in dic else UNK\n", - " X[i,-1 - no]= val\n", - " return X\n", - "\n", - "def cleaning(string):\n", - " string = unidecode(string).replace('.', ' . ').replace(',', ' , ')\n", - " string = re.sub('[^A-Za-z\\- ]+', ' ', string)\n", - " string = re.sub(r'[ ]+', ' ', string).strip()\n", - " return string.lower()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "
" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([0, 1]), array([255024, 149263]))" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.unique(label, return_counts = True)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [00:07<00:00, 54845.16it/s]\n" - ] - } - ], - "source": [ - "for i in tqdm(range(len(left))):\n", - " left[i] = cleaning(left[i])\n", - " right[i] = cleaning(right[i])" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 87661\n", - "Most common words [('the', 377593), ('what', 324635), ('is', 269934), ('i', 223893), ('how', 220876), ('a', 212757)]\n", - "Sample data [5, 6, 4, 1285, 62, 1285, 2501, 10, 564, 11] ['what', 'is', 'the', 'step', 'by', 'step', 'guide', 'to', 'invest', 'in']\n" - ] - } - ], - "source": [ - "concat = ' '.join(left + right).split()\n", - "vocabulary_size = len(list(set(concat)))\n", - "data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size)\n", - "print('vocab from size: %d'%(vocabulary_size))\n", - "print('Most common words', count[4:10])\n", - "print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "class Model:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " dict_size, learning_rate, dropout, kernel_size = 5):\n", - " \n", - " def cnn(x, scope):\n", - " x += position_encoding(x)\n", - " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", - " for n in range(num_layers):\n", - " dilation_rate = 2 ** n\n", - " pad_sz = (kernel_size - 1) * dilation_rate \n", - " with tf.variable_scope('block_%d'%i,reuse=tf.AUTO_REUSE):\n", - " x += cnn_block(x, dilation_rate, pad_sz, size_layer, kernel_size)\n", - " \n", - " with tf.variable_scope('logits', reuse=tf.AUTO_REUSE):\n", - " return tf.layers.dense(x, size_layer)[:, -1]\n", - " \n", - " self.X_left = tf.placeholder(tf.int32, [None, None])\n", - " self.X_right = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.float32, [None])\n", - " self.batch_size = tf.shape(self.X_left)[0]\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", - " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X_left)\n", - " embedded_right = tf.nn.embedding_lookup(encoder_embeddings, self.X_right)\n", - " \n", - " def contrastive_loss(y,d):\n", - " tmp= y * tf.square(d)\n", - " tmp2 = (1-y) * tf.square(tf.maximum((1 - d),0))\n", - " return tf.reduce_sum(tmp +tmp2)/tf.cast(self.batch_size,tf.float32)/2\n", - " \n", - " self.output_left = cnn(embedded_left, 'left')\n", - " self.output_right = cnn(embedded_right, 'right')\n", - " print(self.output_left, self.output_right)\n", - " self.distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(self.output_left,self.output_right)),\n", - " 1,keep_dims=True))\n", - " self.distance = tf.div(self.distance, tf.add(tf.sqrt(tf.reduce_sum(tf.square(self.output_left),\n", - " 1,keep_dims=True)),\n", - " tf.sqrt(tf.reduce_sum(tf.square(self.output_right),\n", - " 1,keep_dims=True))))\n", - " self.distance = tf.reshape(self.distance, [-1])\n", - " self.cost = contrastive_loss(self.Y,self.distance)\n", - " \n", - " self.temp_sim = tf.subtract(tf.ones_like(self.distance),\n", - " tf.rint(self.distance))\n", - " correct_predictions = tf.equal(self.temp_sim, self.Y)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, \"float\"))\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 1e-3\n", - "maxlen = 50\n", - "batch_size = 128\n", - "dropout = 0.8" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "\n", - "vectors_left = str_idx(left, dictionary, maxlen)\n", - "vectors_right = str_idx(right, dictionary, maxlen)\n", - "train_X_left, test_X_left, train_X_right, test_X_right, train_Y, test_Y = train_test_split(vectors_left,\n", - " vectors_right,\n", - " label,\n", - " test_size = 0.2)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Colocations handled automatically by placer.\n", - "WARNING:tensorflow:From :4: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n", - "WARNING:tensorflow:From :24: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.conv1d instead.\n", - "WARNING:tensorflow:From :43: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dense instead.\n", - "Tensor(\"left/logits/strided_slice:0\", shape=(?, 128), dtype=float32) Tensor(\"right/logits/strided_slice:0\", shape=(?, 128), dtype=float32)\n", - "WARNING:tensorflow:From :62: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "keep_dims is deprecated, use keepdims instead\n", - "WARNING:tensorflow:From :66: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Deprecated in favor of operator or tf.math.divide.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate,dropout)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:56<00:00, 44.82it/s, accuracy=0.713, cost=0.0944]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 164.55it/s, accuracy=0.7, cost=0.0912] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.44it/s, accuracy=0.719, cost=0.0951]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.000000, current acc: 0.718526\n", - "time taken: 60.22631072998047\n", - "epoch: 0, training loss: 0.102462, training acc: 0.686018, valid loss: 0.094624, valid acc: 0.718526\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.66it/s, accuracy=0.733, cost=0.0908]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 170.70it/s, accuracy=0.722, cost=0.0877]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.23it/s, accuracy=0.75, cost=0.0887] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.718526, current acc: 0.726600\n", - "time taken: 57.863216400146484\n", - "epoch: 0, training loss: 0.090969, training acc: 0.733650, valid loss: 0.091963, valid acc: 0.726600\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.46it/s, accuracy=0.812, cost=0.0809]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 170.07it/s, accuracy=0.733, cost=0.087] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.45it/s, accuracy=0.742, cost=0.0818]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.726600, current acc: 0.729846\n", - "time taken: 58.1152925491333\n", - "epoch: 0, training loss: 0.084663, training acc: 0.758519, valid loss: 0.090660, valid acc: 0.729846\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.46it/s, accuracy=0.812, cost=0.0746]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 170.52it/s, accuracy=0.756, cost=0.0854]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.34it/s, accuracy=0.773, cost=0.077] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 58.09940052032471\n", - "epoch: 0, training loss: 0.079745, training acc: 0.776804, valid loss: 0.091354, valid acc: 0.726319\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.48it/s, accuracy=0.812, cost=0.0669]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 170.46it/s, accuracy=0.744, cost=0.0865]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.80it/s, accuracy=0.812, cost=0.075] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 58.082401752471924\n", - "epoch: 0, training loss: 0.075484, training acc: 0.792712, valid loss: 0.092073, valid acc: 0.720588\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.52it/s, accuracy=0.861, cost=0.0598]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:03<00:00, 170.85it/s, accuracy=0.767, cost=0.0831]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 58.026214838027954\n", - "epoch: 0, training loss: 0.071810, training acc: 0.806165, valid loss: 0.091487, valid acc: 0.724296\n", - "\n", - "break epoch:0\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "import time\n", - "\n", - "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 3, 0, 0, 0\n", - "\n", - "while True:\n", - " lasttime = time.time()\n", - " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", - " print('break epoch:%d\\n' % (EPOCH))\n", - " break\n", - "\n", - " train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0\n", - " pbar = tqdm(range(0, len(train_X_left), batch_size), desc='train minibatch loop')\n", - " for i in pbar:\n", - " batch_x_left = train_X_left[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_x_right = train_X_right[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_y = train_Y[i:min(i+batch_size,train_X_left.shape[0])]\n", - " acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict = {model.X_left : batch_x_left, \n", - " model.X_right: batch_x_right,\n", - " model.Y : batch_y})\n", - " assert not np.isnan(loss)\n", - " train_loss += loss\n", - " train_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " pbar = tqdm(range(0, len(test_X_left), batch_size), desc='test minibatch loop')\n", - " for i in pbar:\n", - " batch_x_left = test_X_left[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_x_right = test_X_right[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_y = test_Y[i:min(i+batch_size,train_X_left.shape[0])]\n", - " acc, loss = sess.run([model.accuracy, model.cost], \n", - " feed_dict = {model.X_left : batch_x_left, \n", - " model.X_right: batch_x_right,\n", - " model.Y : batch_y})\n", - " test_loss += loss\n", - " test_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " train_loss /= (len(train_X_left) / batch_size)\n", - " train_acc /= (len(train_X_left) / batch_size)\n", - " test_loss /= (len(test_X_left) / batch_size)\n", - " test_acc /= (len(test_X_left) / batch_size)\n", - " \n", - " if test_acc > CURRENT_ACC:\n", - " print(\n", - " 'epoch: %d, pass acc: %f, current acc: %f'\n", - " % (EPOCH, CURRENT_ACC, test_acc)\n", - " )\n", - " CURRENT_ACC = test_acc\n", - " CURRENT_CHECKPOINT = 0\n", - " else:\n", - " CURRENT_CHECKPOINT += 1\n", - " \n", - " print('time taken:', time.time()-lasttime)\n", - " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", - " train_acc,test_loss,\n", - " test_acc))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[array([0.], dtype=float32), array([0.05150324], dtype=float32)]" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "left = str_idx(['a person is outdoors, on a horse.'], dictionary, maxlen)\n", - "right = str_idx(['a person on a horse jumps over a broken down airplane.'], dictionary, maxlen)\n", - "sess.run([model.temp_sim,1-model.distance], feed_dict = {model.X_left : left, \n", - " model.X_right: right})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/text-similarity/3.birnn-circle-loss.ipynb b/text-similarity/3.birnn-circle-loss.ipynb new file mode 100644 index 0000000..8a58fb8 --- /dev/null +++ b/text-similarity/3.birnn-circle-loss.ipynb @@ -0,0 +1,474 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('pair.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(self, size_layer, num_layers, embedded_size,\n", + " dict_size, learning_rate, dropout):\n", + " \n", + " def cells(size, reuse=False):\n", + " cell = tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " return tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=dropout)\n", + " \n", + " def birnn(inputs, scope):\n", + " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = inputs,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " inputs = tf.concat((out_fw, out_bw), 2)\n", + " return inputs[:,-1]\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", + " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " \n", + " self.out = birnn(embedded_left, 'left')\n", + " self.out = tf.layers.dense(self.out, size_layer)\n", + " self.out = tf.nn.l2_normalize(self.out, 1)\n", + " self.logits = tf.layers.dense(self.out,2,use_bias=False,\n", + " kernel_constraint=tf.keras.constraints.unit_norm())\n", + " print(self.logits)\n", + " \n", + " self.gamma = 64\n", + " self.margin = 0.25\n", + " self.O_p = 1 + self.margin\n", + " self.O_n = -self.margin\n", + " self.Delta_p = 1 - self.margin\n", + " self.Delta_n = self.margin\n", + " \n", + " self.batch_idxs = tf.expand_dims(\n", + " tf.range(0, self.batch_size, dtype=tf.int32), 1) # shape [batch,1]\n", + " idxs = tf.concat([self.batch_idxs, tf.cast(self.Y, tf.int32)], 1)\n", + " sp = tf.expand_dims(tf.gather_nd(self.logits, idxs), 1)\n", + " mask = tf.logical_not(\n", + " tf.scatter_nd(idxs, tf.ones(tf.shape(idxs)[0], tf.bool),\n", + " tf.shape(self.logits)))\n", + "\n", + " sn = tf.reshape(tf.boolean_mask(self.logits, mask), (self.batch_size, -1))\n", + "\n", + " alpha_p = tf.nn.relu(self.O_p - tf.stop_gradient(sp))\n", + " alpha_n = tf.nn.relu(tf.stop_gradient(sn) - self.O_n)\n", + "\n", + " r_sp_m = alpha_p * (sp - self.Delta_p)\n", + " r_sn_m = alpha_n * (sn - self.Delta_n)\n", + " _Z = tf.concat([r_sn_m, r_sp_m], 1)\n", + " _Z = _Z * self.gamma\n", + " # sum all similarity\n", + " logZ = tf.math.reduce_logsumexp(_Z, 1, keepdims=True)\n", + " # remove sn_p from all sum similarity\n", + " self.cost = -r_sp_m * self.gamma + logZ\n", + " self.cost = tf.reduce_mean(self.cost[:,0])\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y[:,0]\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 256\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "dropout = 1.0\n", + "vocab_size = 30000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :17: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :28: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "Tensor(\"dense_1/MatMul:0\", shape=(?, 2), dtype=float32)\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,num_layers,embedded_size,vocab_size,learning_rate,dropout)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['left_train', 'label_train', 'left_test', 'label_test'])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X_left = data['left_train']\n", + "train_Y = data['label_train']\n", + "test_X_left = data['left_test']\n", + "test_Y = data['label_test']" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 2046/2046 [11:42<00:00, 2.91it/s, accuracy=0.524, cost=23.9]\n", + "test minibatch loop: 100%|██████████| 105/105 [00:12<00:00, 8.33it/s, accuracy=0.542, cost=23.8]\n", + "train minibatch loop: 0%| | 0/2046 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time()-lasttime)\n", + " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", + " train_acc,test_loss,\n", + " test_acc))" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "test minibatch loop: 100%|██████████| 105/105 [00:12<00:00, 8.20it/s, accuracy=0.783, cost=13.9]\n" + ] + } + ], + "source": [ + "test_loss, test_acc = [], []\n", + "pbar = tqdm(range(0, len(test_X_left), batch_size), desc='test minibatch loop')\n", + "for i in pbar:\n", + " index = min(i+batch_size,len(test_X_left))\n", + " batch_x_left = test_X_left[i:index]\n", + " batch_y = test_Y[i:index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " batch_x_left = pad_sequences(batch_x_left, padding='post')\n", + " acc, loss = sess.run([model.accuracy, model.cost], \n", + " feed_dict = {model.X : batch_x_left,\n", + " model.Y : batch_y})\n", + "\n", + " test_loss.append(loss)\n", + " test_acc.append(acc)\n", + " pbar.set_postfix(cost=loss, accuracy = acc)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.75812805" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.mean(test_acc)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/3.transformer-contrastive.ipynb b/text-similarity/3.transformer-contrastive.ipynb deleted file mode 100644 index ab0c837..0000000 --- a/text-similarity/3.transformer-contrastive.ipynb +++ /dev/null @@ -1,974 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/jupyter/.local/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", - " \"This module will be removed in 0.20.\", DeprecationWarning)\n" - ] - } - ], - "source": [ - "import tensorflow as tf\n", - "import re\n", - "import numpy as np\n", - "import pandas as pd\n", - "from tqdm import tqdm\n", - "import collections\n", - "from unidecode import unidecode\n", - "from sklearn.cross_validation import train_test_split" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3]]\n", - " count.extend(collections.Counter(words).most_common(n_words - 1))\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary\n", - "\n", - "def str_idx(corpus, dic, maxlen, UNK=3):\n", - " X = np.zeros((len(corpus),maxlen))\n", - " for i in range(len(corpus)):\n", - " for no, k in enumerate(corpus[i][:maxlen][::-1]):\n", - " val = dic[k] if k in dic else UNK\n", - " X[i,-1 - no]= val\n", - " return X\n", - "\n", - "def cleaning(string):\n", - " string = unidecode(string).replace('.', ' . ').replace(',', ' , ')\n", - " string = re.sub('[^A-Za-z\\- ]+', ' ', string)\n", - " string = re.sub(r'[ ]+', ' ', string).strip()\n", - " return string.lower()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "
" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([0, 1]), array([255024, 149263]))" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.unique(label, return_counts = True)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [00:07<00:00, 53664.30it/s]\n" - ] - } - ], - "source": [ - "for i in tqdm(range(len(left))):\n", - " left[i] = cleaning(left[i])\n", - " right[i] = cleaning(right[i])" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 87661\n", - "Most common words [('the', 377593), ('what', 324635), ('is', 269934), ('i', 223893), ('how', 220876), ('a', 212757)]\n", - "Sample data [5, 6, 4, 1285, 62, 1285, 2501, 10, 564, 11] ['what', 'is', 'the', 'step', 'by', 'step', 'guide', 'to', 'invest', 'in']\n" - ] - } - ], - "source": [ - "concat = ' '.join(left + right).split()\n", - "vocabulary_size = len(list(set(concat)))\n", - "data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size)\n", - "print('vocab from size: %d'%(vocabulary_size))\n", - "print('Most common words', count[4:10])\n", - "print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "def self_attention(inputs, is_training, num_units, num_heads = 8, activation=None):\n", - " T_q = T_k = tf.shape(inputs)[1]\n", - " Q_K_V = tf.layers.dense(inputs, 3*num_units, activation)\n", - " Q, K, V = tf.split(Q_K_V, 3, -1)\n", - " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), 0)\n", - " K_ = tf.concat(tf.split(K, num_heads, axis=2), 0)\n", - " V_ = tf.concat(tf.split(V, num_heads, axis=2), 0)\n", - " align = tf.matmul(Q_, K_, transpose_b=True)\n", - " align *= tf.rsqrt(tf.to_float(K_.get_shape()[-1].value))\n", - " paddings = tf.fill(tf.shape(align), float('-inf'))\n", - " lower_tri = tf.ones([T_q, T_k])\n", - " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense()\n", - " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0],1,1])\n", - " align = tf.where(tf.equal(masks, 0), paddings, align)\n", - " align = tf.nn.softmax(align)\n", - " align = tf.layers.dropout(align, 0.1, training=is_training) \n", - " x = tf.matmul(align, V_)\n", - " x = tf.concat(tf.split(x, num_heads, axis=0), 2)\n", - " x += inputs\n", - " x = layer_norm(x)\n", - " return x\n", - "\n", - "def ffn(inputs, hidden_dim, activation=tf.nn.relu):\n", - " x = tf.layers.conv1d(inputs, 4* hidden_dim, 1, activation=activation) \n", - " x = tf.layers.conv1d(x, hidden_dim, 1, activation=None)\n", - " x += inputs\n", - " x = layer_norm(x)\n", - " return x\n", - "\n", - "class Model:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " dict_size, learning_rate, dropout, kernel_size = 5):\n", - " \n", - " def cnn(x, scope):\n", - " x += position_encoding(x)\n", - " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", - " for n in range(num_layers):\n", - " with tf.variable_scope('attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " x = self_attention(x, True, size_layer)\n", - " with tf.variable_scope('ffn_%d'%i, reuse=tf.AUTO_REUSE):\n", - " x = ffn(x, size_layer)\n", - " \n", - " with tf.variable_scope('logits', reuse=tf.AUTO_REUSE):\n", - " return tf.layers.dense(x, size_layer)[:, -1]\n", - " \n", - " self.X_left = tf.placeholder(tf.int32, [None, None])\n", - " self.X_right = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.float32, [None])\n", - " self.batch_size = tf.shape(self.X_left)[0]\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", - " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X_left)\n", - " embedded_right = tf.nn.embedding_lookup(encoder_embeddings, self.X_right)\n", - " \n", - " def contrastive_loss(y,d):\n", - " tmp= y * tf.square(d)\n", - " tmp2 = (1-y) * tf.square(tf.maximum((1 - d),0))\n", - " return tf.reduce_sum(tmp +tmp2)/tf.cast(self.batch_size,tf.float32)/2\n", - " \n", - " self.output_left = cnn(embedded_left, 'left')\n", - " self.output_right = cnn(embedded_right, 'right')\n", - " print(self.output_left, self.output_right)\n", - " self.distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(self.output_left,self.output_right)),\n", - " 1,keep_dims=True))\n", - " self.distance = tf.div(self.distance, tf.add(tf.sqrt(tf.reduce_sum(tf.square(self.output_left),\n", - " 1,keep_dims=True)),\n", - " tf.sqrt(tf.reduce_sum(tf.square(self.output_right),\n", - " 1,keep_dims=True))))\n", - " self.distance = tf.reshape(self.distance, [-1])\n", - " self.cost = contrastive_loss(self.Y,self.distance)\n", - " \n", - " self.temp_sim = tf.subtract(tf.ones_like(self.distance),\n", - " tf.rint(self.distance))\n", - " correct_predictions = tf.equal(self.temp_sim, self.Y)\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, \"float\"))\n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 1e-4\n", - "maxlen = 50\n", - "batch_size = 128\n", - "dropout = 0.8" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "\n", - "vectors_left = str_idx(left, dictionary, maxlen)\n", - "vectors_right = str_idx(right, dictionary, maxlen)\n", - "train_X_left, test_X_left, train_X_right, test_X_right, train_Y, test_Y = train_test_split(vectors_left,\n", - " vectors_right,\n", - " label,\n", - " test_size = 0.2)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Colocations handled automatically by placer.\n", - "WARNING:tensorflow:From :4: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n", - "WARNING:tensorflow:From :20: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dense instead.\n", - "WARNING:tensorflow:From :33: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dropout instead.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", - "WARNING:tensorflow:From :41: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.conv1d instead.\n", - "Tensor(\"left/logits/strided_slice:0\", shape=(?, 128), dtype=float32) Tensor(\"right/logits/strided_slice:0\", shape=(?, 128), dtype=float32)\n", - "WARNING:tensorflow:From :80: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "keep_dims is deprecated, use keepdims instead\n", - "WARNING:tensorflow:From :84: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Deprecated in favor of operator or tf.math.divide.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate,dropout)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:41<00:00, 25.12it/s, accuracy=0.693, cost=0.1] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 65.48it/s, accuracy=0.711, cost=0.096] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:40, 25.16it/s, accuracy=0.703, cost=0.101] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.000000, current acc: 0.685201\n", - "time taken: 111.32214426994324\n", - "epoch: 0, training loss: 0.106726, training acc: 0.669383, valid loss: 0.103184, valid acc: 0.685201\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.08it/s, accuracy=0.733, cost=0.0915]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.03it/s, accuracy=0.722, cost=0.0919]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:41, 24.90it/s, accuracy=0.688, cost=0.104] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.685201, current acc: 0.701866\n", - "time taken: 110.18735837936401\n", - "epoch: 0, training loss: 0.100379, training acc: 0.691623, valid loss: 0.098808, valid acc: 0.701866\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.11it/s, accuracy=0.733, cost=0.0892]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.03it/s, accuracy=0.678, cost=0.095] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.28it/s, accuracy=0.711, cost=0.0951]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.701866, current acc: 0.712456\n", - "time taken: 110.06335616111755\n", - "epoch: 0, training loss: 0.096448, training acc: 0.707221, valid loss: 0.096495, valid acc: 0.712456\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.09it/s, accuracy=0.743, cost=0.0927]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 66.97it/s, accuracy=0.644, cost=0.0971]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.36it/s, accuracy=0.719, cost=0.0931]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.712456, current acc: 0.715025\n", - "time taken: 110.16492295265198\n", - "epoch: 0, training loss: 0.093926, training acc: 0.717781, valid loss: 0.095615, valid acc: 0.715025\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.03it/s, accuracy=0.752, cost=0.0877]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 66.98it/s, accuracy=0.678, cost=0.097] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:41, 24.84it/s, accuracy=0.688, cost=0.0955]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.715025, current acc: 0.721843\n", - "time taken: 110.38844656944275\n", - "epoch: 0, training loss: 0.092020, training acc: 0.726040, valid loss: 0.094243, valid acc: 0.721843\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.11it/s, accuracy=0.723, cost=0.0882]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.09it/s, accuracy=0.667, cost=0.0952]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:41, 24.93it/s, accuracy=0.75, cost=0.0906] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.721843, current acc: 0.722270\n", - "time taken: 110.06278610229492\n", - "epoch: 0, training loss: 0.090355, training acc: 0.733065, valid loss: 0.093710, valid acc: 0.722270\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.03it/s, accuracy=0.752, cost=0.086] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 66.96it/s, accuracy=0.7, cost=0.0953] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:41, 24.94it/s, accuracy=0.742, cost=0.0918]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.722270, current acc: 0.725934\n", - "time taken: 110.40167164802551\n", - "epoch: 0, training loss: 0.088796, training acc: 0.739814, valid loss: 0.092955, valid acc: 0.725934\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.15it/s, accuracy=0.762, cost=0.0806]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.26it/s, accuracy=0.689, cost=0.096] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:41, 24.84it/s, accuracy=0.781, cost=0.0892]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 109.86811327934265\n", - "epoch: 0, training loss: 0.087358, training acc: 0.746224, valid loss: 0.092556, valid acc: 0.725335\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.04it/s, accuracy=0.762, cost=0.0808]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.34it/s, accuracy=0.7, cost=0.0938] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:38, 25.63it/s, accuracy=0.805, cost=0.0879]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.725934, current acc: 0.729039\n", - "time taken: 110.31477642059326\n", - "epoch: 0, training loss: 0.085995, training acc: 0.751777, valid loss: 0.091761, valid acc: 0.729039\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.26it/s, accuracy=0.743, cost=0.0775]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.20it/s, accuracy=0.722, cost=0.0949]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:40, 25.17it/s, accuracy=0.727, cost=0.0899]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.729039, current acc: 0.730447\n", - "time taken: 109.4636116027832\n", - "epoch: 0, training loss: 0.084593, training acc: 0.756880, valid loss: 0.091620, valid acc: 0.730447\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.15it/s, accuracy=0.792, cost=0.0763]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 66.96it/s, accuracy=0.711, cost=0.0971]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.33it/s, accuracy=0.781, cost=0.0882]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.730447, current acc: 0.732334\n", - "time taken: 109.93308997154236\n", - "epoch: 0, training loss: 0.083287, training acc: 0.762669, valid loss: 0.091151, valid acc: 0.732334\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.26it/s, accuracy=0.772, cost=0.0729]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.32it/s, accuracy=0.678, cost=0.098] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.40it/s, accuracy=0.781, cost=0.0819]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.732334, current acc: 0.732491\n", - "time taken: 109.41248917579651\n", - "epoch: 0, training loss: 0.082038, training acc: 0.767324, valid loss: 0.090638, valid acc: 0.732491\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.21it/s, accuracy=0.772, cost=0.0769]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.24it/s, accuracy=0.711, cost=0.0949]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:38, 25.54it/s, accuracy=0.781, cost=0.0809]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.732491, current acc: 0.734844\n", - "time taken: 109.63890266418457\n", - "epoch: 0, training loss: 0.080769, training acc: 0.772957, valid loss: 0.090315, valid acc: 0.734844\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.16it/s, accuracy=0.822, cost=0.0687]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.61it/s, accuracy=0.744, cost=0.0907]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.38it/s, accuracy=0.781, cost=0.0854]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 109.79329133033752\n", - "epoch: 0, training loss: 0.079631, training acc: 0.777117, valid loss: 0.090068, valid acc: 0.734180\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.25it/s, accuracy=0.822, cost=0.0702]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.38it/s, accuracy=0.722, cost=0.091] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:40, 25.05it/s, accuracy=0.781, cost=0.0819]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.734844, current acc: 0.735022\n", - "time taken: 109.46223187446594\n", - "epoch: 0, training loss: 0.078417, training acc: 0.781514, valid loss: 0.089608, valid acc: 0.735022\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.14it/s, accuracy=0.782, cost=0.0686]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 66.88it/s, accuracy=0.711, cost=0.0945]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:40, 25.15it/s, accuracy=0.75, cost=0.0856] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.735022, current acc: 0.737936\n", - "time taken: 109.98049426078796\n", - "epoch: 0, training loss: 0.077204, training acc: 0.786631, valid loss: 0.089129, valid acc: 0.737936\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:39<00:00, 25.27it/s, accuracy=0.792, cost=0.0682]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.62it/s, accuracy=0.722, cost=0.0938]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:38, 25.51it/s, accuracy=0.836, cost=0.0775]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.737936, current acc: 0.739277\n", - "time taken: 109.33117318153381\n", - "epoch: 0, training loss: 0.076121, training acc: 0.790172, valid loss: 0.089027, valid acc: 0.739277\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.23it/s, accuracy=0.832, cost=0.067] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.35it/s, accuracy=0.7, cost=0.0949] \n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:38, 25.66it/s, accuracy=0.82, cost=0.0774] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.739277, current acc: 0.739749\n", - "time taken: 109.55670094490051\n", - "epoch: 0, training loss: 0.074985, training acc: 0.794015, valid loss: 0.088705, valid acc: 0.739749\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:39<00:00, 25.35it/s, accuracy=0.812, cost=0.0635]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.36it/s, accuracy=0.711, cost=0.0848]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:40, 25.13it/s, accuracy=0.797, cost=0.0756]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.739749, current acc: 0.740187\n", - "time taken: 109.05358052253723\n", - "epoch: 0, training loss: 0.074041, training acc: 0.797890, valid loss: 0.088700, valid acc: 0.740187\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.26it/s, accuracy=0.842, cost=0.0616]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.31it/s, accuracy=0.689, cost=0.0933]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:39, 25.36it/s, accuracy=0.773, cost=0.0746]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 109.43666005134583\n", - "epoch: 0, training loss: 0.072876, training acc: 0.801452, valid loss: 0.088649, valid acc: 0.739768\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.21it/s, accuracy=0.871, cost=0.0602]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.55it/s, accuracy=0.689, cost=0.0911]\n", - "train minibatch loop: 0%| | 3/2527 [00:00<01:37, 25.84it/s, accuracy=0.812, cost=0.0774]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 109.58015727996826\n", - "epoch: 0, training loss: 0.071968, training acc: 0.804654, valid loss: 0.088769, valid acc: 0.738841\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [01:40<00:00, 25.22it/s, accuracy=0.822, cost=0.0614]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:09<00:00, 67.43it/s, accuracy=0.689, cost=0.0998]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 109.57378196716309\n", - "epoch: 0, training loss: 0.070959, training acc: 0.809158, valid loss: 0.088572, valid acc: 0.739855\n", - "\n", - "break epoch:0\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "import time\n", - "\n", - "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 3, 0, 0, 0\n", - "\n", - "while True:\n", - " lasttime = time.time()\n", - " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", - " print('break epoch:%d\\n' % (EPOCH))\n", - " break\n", - "\n", - " train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0\n", - " pbar = tqdm(range(0, len(train_X_left), batch_size), desc='train minibatch loop')\n", - " for i in pbar:\n", - " batch_x_left = train_X_left[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_x_right = train_X_right[i:min(i+batch_size,train_X_left.shape[0])]\n", - " batch_y = train_Y[i:min(i+batch_size,train_X_left.shape[0])]\n", - " acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict = {model.X_left : batch_x_left, \n", - " model.X_right: batch_x_right,\n", - " model.Y : batch_y})\n", - " assert not np.isnan(loss)\n", - " train_loss += loss\n", - " train_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " pbar = tqdm(range(0, len(test_X_left), batch_size), desc='test minibatch loop')\n", - " for i in pbar:\n", - " batch_x_left = test_X_left[i:min(i+batch_size,test_X_left.shape[0])]\n", - " batch_x_right = test_X_right[i:min(i+batch_size,test_X_left.shape[0])]\n", - " batch_y = test_Y[i:min(i+batch_size,test_X_left.shape[0])]\n", - " acc, loss = sess.run([model.accuracy, model.cost], \n", - " feed_dict = {model.X_left : batch_x_left, \n", - " model.X_right: batch_x_right,\n", - " model.Y : batch_y})\n", - " test_loss += loss\n", - " test_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " train_loss /= (len(train_X_left) / batch_size)\n", - " train_acc /= (len(train_X_left) / batch_size)\n", - " test_loss /= (len(test_X_left) / batch_size)\n", - " test_acc /= (len(test_X_left) / batch_size)\n", - " \n", - " if test_acc > CURRENT_ACC:\n", - " print(\n", - " 'epoch: %d, pass acc: %f, current acc: %f'\n", - " % (EPOCH, CURRENT_ACC, test_acc)\n", - " )\n", - " CURRENT_ACC = test_acc\n", - " CURRENT_CHECKPOINT = 0\n", - " else:\n", - " CURRENT_CHECKPOINT += 1\n", - " \n", - " print('time taken:', time.time()-lasttime)\n", - " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", - " train_acc,test_loss,\n", - " test_acc))" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[array([0.], dtype=float32), array([0.13981318], dtype=float32)]" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "left = str_idx(['a person is outdoors, on a horse.'], dictionary, maxlen)\n", - "right = str_idx(['a person on a horse jumps over a broken down airplane.'], dictionary, maxlen)\n", - "sess.run([model.temp_sim,1-model.distance], feed_dict = {model.X_left : left, \n", - " model.X_right: right})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/text-similarity/4.birnn-proxy-anchor-loss.ipynb b/text-similarity/4.birnn-proxy-anchor-loss.ipynb new file mode 100644 index 0000000..06e985f --- /dev/null +++ b/text-similarity/4.birnn-proxy-anchor-loss.ipynb @@ -0,0 +1,380 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "with open('pair.json') as fopen:\n", + " data = json.load(fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(self, size_layer, num_layers, embedded_size,\n", + " dict_size, learning_rate, dropout):\n", + " \n", + " def cells(size, reuse=False):\n", + " cell = tf.nn.rnn_cell.LSTMCell(size,initializer=tf.orthogonal_initializer(),reuse=reuse)\n", + " return tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=dropout)\n", + " \n", + " def birnn(inputs, scope):\n", + " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", + " for n in range(num_layers):\n", + " (out_fw, out_bw), (state_fw, state_bw) = tf.nn.bidirectional_dynamic_rnn(\n", + " cell_fw = cells(size_layer // 2),\n", + " cell_bw = cells(size_layer // 2),\n", + " inputs = inputs,\n", + " dtype = tf.float32,\n", + " scope = 'bidirectional_rnn_%d'%(n))\n", + " inputs = tf.concat((out_fw, out_bw), 2)\n", + " return inputs[:,-1]\n", + " \n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " Y = tf.one_hot(self.Y, 2)\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", + " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", + " \n", + " self.out = birnn(embedded_left, 'left')\n", + " self.out = tf.layers.dense(self.out, size_layer)\n", + " self.out = tf.nn.l2_normalize(self.out, 1)\n", + " self.logits = tf.layers.dense(self.out,2,use_bias=False,\n", + " kernel_constraint=tf.keras.constraints.unit_norm())\n", + " \n", + " self.gamma = 32\n", + " self.margin = 0.1\n", + " \n", + " num_valid_proxies = tf.reduce_sum(tf.cast(tf.reduce_sum(\n", + " self.Y, 0, keepdims=True) != 0, tf.float32))\n", + " y_pred = ((Y * (self.logits - self.margin) / num_valid_proxies) +\n", + " ((1 - Y) * (self.logits - self.margin) / tf.cast(tf.shape(Y)[-1], tf.float32))) * self.gamma\n", + " self.cost = tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=y_pred)\n", + " self.cost = tf.reduce_mean(self.cost)\n", + " \n", + " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), tf.cast(self.Y, tf.int32)\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "size_layer = 256\n", + "num_layers = 2\n", + "embedded_size = 256\n", + "learning_rate = 1e-3\n", + "batch_size = 128\n", + "dropout = 1.0\n", + "vocab_size = 30000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From :6: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From :17: bidirectional_dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn.py:464: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `keras.layers.RNN(cell)`, which is equivalent to this API\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:958: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.add_weight` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:962: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Call initializer instance with the dtype argument instead of passing it to the constructor\n", + "WARNING:tensorflow:From :29: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From :41: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "\n", + "Future major versions of TensorFlow will allow gradients to flow\n", + "into the labels input on backprop by default.\n", + "\n", + "See `tf.nn.softmax_cross_entropy_with_logits_v2`.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(size_layer,num_layers,embedded_size,vocab_size,learning_rate,dropout)\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['left_train', 'label_train', 'left_test', 'label_test'])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "train_X_left = data['left_train']\n", + "train_Y = data['label_train']\n", + "test_X_left = data['left_test']\n", + "test_Y = data['label_test']" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "pad_sequences = tf.keras.preprocessing.sequence.pad_sequences" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 2046/2046 [10:55<00:00, 3.12it/s, accuracy=0.5, cost=5.26e-5] \n", + "test minibatch loop: 100%|██████████| 105/105 [00:12<00:00, 8.25it/s, accuracy=0.542, cost=4.93e-5]\n", + "train minibatch loop: 0%| | 0/2046 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time()-lasttime)\n", + " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", + " train_acc,test_loss,\n", + " test_acc))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/4.dilated-cnn-crossentropy.ipynb b/text-similarity/4.dilated-cnn-crossentropy.ipynb deleted file mode 100644 index 40bc3dd..0000000 --- a/text-similarity/4.dilated-cnn-crossentropy.ipynb +++ /dev/null @@ -1,746 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/jupyter/.local/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", - " \"This module will be removed in 0.20.\", DeprecationWarning)\n" - ] - } - ], - "source": [ - "import tensorflow as tf\n", - "import re\n", - "import numpy as np\n", - "import pandas as pd\n", - "from tqdm import tqdm\n", - "import collections\n", - "from unidecode import unidecode\n", - "from sklearn.cross_validation import train_test_split" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "def build_dataset(words, n_words):\n", - " count = [['PAD', 0], ['GO', 1], ['EOS', 2], ['UNK', 3], ['SEPARATOR', 4]]\n", - " count.extend(collections.Counter(words).most_common(n_words - 1))\n", - " dictionary = dict()\n", - " for word, _ in count:\n", - " dictionary[word] = len(dictionary)\n", - " data = list()\n", - " unk_count = 0\n", - " for word in words:\n", - " index = dictionary.get(word, 0)\n", - " if index == 0:\n", - " unk_count += 1\n", - " data.append(index)\n", - " count[0][1] = unk_count\n", - " reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys()))\n", - " return data, count, dictionary, reversed_dictionary\n", - "\n", - "def str_idx(corpus, dic, maxlen, UNK=3):\n", - " X = np.zeros((len(corpus),maxlen))\n", - " for i in range(len(corpus)):\n", - " for no, k in enumerate(corpus[i][:maxlen][::-1]):\n", - " val = dic[k] if k in dic else UNK\n", - " X[i,-1 - no]= val\n", - " return X\n", - "\n", - "def cleaning(string):\n", - " string = unidecode(string).replace('.', ' . ').replace(',', ' , ')\n", - " string = re.sub('[^A-Za-z\\- ]+', ' ', string)\n", - " string = re.sub(r'[ ]+', ' ', string).strip()\n", - " return string.lower()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "
" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([0, 1]), array([255024, 149263]))" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.unique(label, return_counts = True)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [00:07<00:00, 51783.93it/s]\n" - ] - } - ], - "source": [ - "for i in tqdm(range(len(left))):\n", - " left[i] = cleaning(left[i])\n", - " right[i] = cleaning(right[i])\n", - " left[i] = left[i] + ' SEPARATOR ' + right[i]" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 87662\n", - "Most common words [['SEPARATOR', 4], ('SEPARATOR', 404287), ('the', 377593), ('what', 324635), ('is', 269934), ('i', 223893)]\n", - "Sample data [6, 7, 5, 1286, 63, 1286, 2502, 11, 565, 12] ['what', 'is', 'the', 'step', 'by', 'step', 'guide', 'to', 'invest', 'in']\n" - ] - } - ], - "source": [ - "concat = ' '.join(left).split()\n", - "vocabulary_size = len(list(set(concat)))\n", - "data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size)\n", - "print('vocab from size: %d'%(vocabulary_size))\n", - "print('Most common words', count[4:10])\n", - "print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "def cnn_block(x, dilation_rate, pad_sz, hidden_dim, kernel_size):\n", - " x = layer_norm(x)\n", - " pad = tf.zeros([tf.shape(x)[0], pad_sz, hidden_dim])\n", - " x = tf.layers.conv1d(inputs = tf.concat([pad, x, pad], 1),\n", - " filters = hidden_dim,\n", - " kernel_size = kernel_size,\n", - " dilation_rate = dilation_rate)\n", - " x = x[:, :-pad_sz, :]\n", - " x = tf.nn.relu(x)\n", - " return x\n", - "\n", - "class Model:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " dict_size, learning_rate, dropout, kernel_size = 5):\n", - " \n", - " def cnn(x, scope):\n", - " x += position_encoding(x)\n", - " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", - " for n in range(num_layers):\n", - " dilation_rate = 2 ** n\n", - " pad_sz = (kernel_size - 1) * dilation_rate \n", - " with tf.variable_scope('block_%d'%i,reuse=tf.AUTO_REUSE):\n", - " x += cnn_block(x, dilation_rate, pad_sz, size_layer, kernel_size)\n", - " \n", - " with tf.variable_scope('logits', reuse=tf.AUTO_REUSE):\n", - " return tf.layers.dense(x, size_layer)[:, -1]\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None])\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", - " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " self.logits = cnn(embedded_left, 'left')\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " logits = self.logits, labels = self.Y\n", - " )\n", - " )\n", - " \n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " correct_pred = tf.equal(\n", - " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", - " )\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 1e-3\n", - "maxlen = 50\n", - "batch_size = 128\n", - "dropout = 0.8" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "\n", - "vectors = str_idx(left, dictionary, maxlen)\n", - "train_X, test_X, train_Y, test_Y = train_test_split(vectors, label, test_size = 0.2)" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py:1702: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", - " warnings.warn('An interactive session is already active. This can '\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate,dropout)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:32<00:00, 77.42it/s, accuracy=0.584, cost=0.645]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 241.53it/s, accuracy=0.678, cost=0.624]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.72it/s, accuracy=0.664, cost=0.638]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.000000, current acc: 0.649024\n", - "time taken: 35.25988554954529\n", - "epoch: 0, training loss: 0.639172, training acc: 0.645532, valid loss: 0.625583, valid acc: 0.649024\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.78it/s, accuracy=0.653, cost=0.605]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 249.91it/s, accuracy=0.756, cost=0.568]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.57it/s, accuracy=0.648, cost=0.625]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.649024, current acc: 0.686088\n", - "time taken: 33.05776762962341\n", - "epoch: 0, training loss: 0.599088, training acc: 0.681601, valid loss: 0.593228, valid acc: 0.686088\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.65it/s, accuracy=0.703, cost=0.568]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.40it/s, accuracy=0.7, cost=0.548] \n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 83.52it/s, accuracy=0.672, cost=0.615]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.686088, current acc: 0.700928\n", - "time taken: 33.1018283367157\n", - "epoch: 0, training loss: 0.572584, training acc: 0.705614, valid loss: 0.578908, valid acc: 0.700928\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.66it/s, accuracy=0.723, cost=0.556]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 251.01it/s, accuracy=0.778, cost=0.521]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 83.45it/s, accuracy=0.703, cost=0.604]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.700928, current acc: 0.705392\n", - "time taken: 33.0923171043396\n", - "epoch: 0, training loss: 0.550349, training acc: 0.723289, valid loss: 0.573883, valid acc: 0.705392\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.63it/s, accuracy=0.733, cost=0.545]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 249.63it/s, accuracy=0.767, cost=0.526]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 83.85it/s, accuracy=0.727, cost=0.582]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.705392, current acc: 0.706215\n", - "time taken: 33.11521649360657\n", - "epoch: 0, training loss: 0.530263, training acc: 0.737710, valid loss: 0.574223, valid acc: 0.706215\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.78it/s, accuracy=0.703, cost=0.507]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 249.87it/s, accuracy=0.722, cost=0.548]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.88it/s, accuracy=0.68, cost=0.566] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.706215, current acc: 0.712823\n", - "time taken: 33.06076192855835\n", - "epoch: 0, training loss: 0.512012, training acc: 0.749806, valid loss: 0.572262, valid acc: 0.712823\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.65it/s, accuracy=0.713, cost=0.539]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 249.05it/s, accuracy=0.711, cost=0.54] \n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.51it/s, accuracy=0.672, cost=0.576]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.712823, current acc: 0.715365\n", - "time taken: 33.11697006225586\n", - "epoch: 0, training loss: 0.495308, training acc: 0.760959, valid loss: 0.575378, valid acc: 0.715365\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.82it/s, accuracy=0.713, cost=0.55] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.17it/s, accuracy=0.689, cost=0.578]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.39it/s, accuracy=0.719, cost=0.558]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.715365, current acc: 0.718076\n", - "time taken: 33.03995633125305\n", - "epoch: 0, training loss: 0.480132, training acc: 0.770668, valid loss: 0.576161, valid acc: 0.718076\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.96it/s, accuracy=0.723, cost=0.532]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.37it/s, accuracy=0.689, cost=0.571]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 83.53it/s, accuracy=0.734, cost=0.556]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.718076, current acc: 0.718397\n", - "time taken: 32.98867201805115\n", - "epoch: 0, training loss: 0.466953, training acc: 0.778197, valid loss: 0.585377, valid acc: 0.718397\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.74it/s, accuracy=0.693, cost=0.579]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.47it/s, accuracy=0.722, cost=0.579]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 83.07it/s, accuracy=0.727, cost=0.532]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.718397, current acc: 0.719860\n", - "time taken: 33.069570541381836\n", - "epoch: 0, training loss: 0.454996, training acc: 0.786085, valid loss: 0.589913, valid acc: 0.719860\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.83it/s, accuracy=0.703, cost=0.545]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.12it/s, accuracy=0.744, cost=0.56] \n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.83it/s, accuracy=0.711, cost=0.518]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.719860, current acc: 0.722752\n", - "time taken: 33.03630042076111\n", - "epoch: 0, training loss: 0.443845, training acc: 0.792981, valid loss: 0.597150, valid acc: 0.722752\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.84it/s, accuracy=0.743, cost=0.536]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 249.00it/s, accuracy=0.744, cost=0.57] \n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.01it/s, accuracy=0.75, cost=0.504] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 33.04733848571777\n", - "epoch: 0, training loss: 0.433595, training acc: 0.798370, valid loss: 0.605825, valid acc: 0.720378\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.71it/s, accuracy=0.762, cost=0.505]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 250.76it/s, accuracy=0.756, cost=0.567]\n", - "train minibatch loop: 0%| | 9/2527 [00:00<00:30, 82.74it/s, accuracy=0.75, cost=0.51] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 33.075902462005615\n", - "epoch: 0, training loss: 0.423926, training acc: 0.803343, valid loss: 0.617053, valid acc: 0.721669\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:30<00:00, 82.80it/s, accuracy=0.723, cost=0.501]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:02<00:00, 251.00it/s, accuracy=0.778, cost=0.559]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 33.04087018966675\n", - "epoch: 0, training loss: 0.415806, training acc: 0.808235, valid loss: 0.627675, valid acc: 0.719070\n", - "\n", - "break epoch:0\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "import time\n", - "\n", - "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 3, 0, 0, 0\n", - "\n", - "while True:\n", - " lasttime = time.time()\n", - " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", - " print('break epoch:%d\\n' % (EPOCH))\n", - " break\n", - "\n", - " train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0\n", - " pbar = tqdm(range(0, len(train_X), batch_size), desc='train minibatch loop')\n", - " for i in pbar:\n", - " batch_x = train_X[i:min(i+batch_size,train_X.shape[0])]\n", - " batch_y = train_Y[i:min(i+batch_size,train_X.shape[0])]\n", - " acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict = {model.X : batch_x,\n", - " model.Y : batch_y})\n", - " assert not np.isnan(loss)\n", - " train_loss += loss\n", - " train_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " pbar = tqdm(range(0, len(test_X), batch_size), desc='test minibatch loop')\n", - " for i in pbar:\n", - " batch_x = test_X[i:min(i+batch_size,test_X.shape[0])]\n", - " batch_y = test_Y[i:min(i+batch_size,test_X.shape[0])]\n", - " acc, loss = sess.run([model.accuracy, model.cost], \n", - " feed_dict = {model.X : batch_x,\n", - " model.Y : batch_y})\n", - " test_loss += loss\n", - " test_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " train_loss /= (len(train_X) / batch_size)\n", - " train_acc /= (len(train_X) / batch_size)\n", - " test_loss /= (len(test_X) / batch_size)\n", - " test_acc /= (len(test_X) / batch_size)\n", - " \n", - " if test_acc > CURRENT_ACC:\n", - " print(\n", - " 'epoch: %d, pass acc: %f, current acc: %f'\n", - " % (EPOCH, CURRENT_ACC, test_acc)\n", - " )\n", - " CURRENT_ACC = test_acc\n", - " CURRENT_CHECKPOINT = 0\n", - " else:\n", - " CURRENT_CHECKPOINT += 1\n", - " \n", - " print('time taken:', time.time()-lasttime)\n", - " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", - " train_acc,test_loss,\n", - " test_acc))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/text-similarity/5.bert-base-cross-entropy.ipynb b/text-similarity/5.bert-base-cross-entropy.ipynb new file mode 100644 index 0000000..f5cc576 --- /dev/null +++ b/text-similarity/5.bert-base-cross-entropy.ipynb @@ -0,0 +1,580 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenization.validate_case_matches_checkpoint(True, '')\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)\n", + "MAX_SEQ_LENGTH = 120" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + "\n", + "def get_data(left, right):\n", + " input_ids, input_masks, segment_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens_a = tokenizer.tokenize(left[i])\n", + " tokens_b = tokenizer.tokenize(right[i])\n", + " _truncate_seq_pair(tokens_a, tokens_b, MAX_SEQ_LENGTH - 3)\n", + " tokens = []\n", + " segment_id = []\n", + " tokens.append(\"[CLS]\")\n", + " segment_id.append(0)\n", + " for token in tokens_a:\n", + " tokens.append(token)\n", + " segment_id.append(0)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(0)\n", + " for token in tokens_b:\n", + " tokens.append(token)\n", + " segment_id.append(1)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(1)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " while len(input_id) < MAX_SEQ_LENGTH:\n", + " input_id.append(0)\n", + " input_mask.append(0)\n", + " segment_id.append(0)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " return input_ids, input_masks, segment_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [02:41<00:00, 1625.51it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:10<00:00, 1282.59it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "epoch = 10\n", + "batch_size = 60\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(left) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=True,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " self.logits = tf.layers.dense(output_layer, dimension_output)\n", + " self.logits = tf.identity(self.logits, name = 'logits')\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/math_grad.py:1375: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "learning_rate = 2e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + " learning_rate\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 4364/4364 [37:12<00:00, 1.95it/s, accuracy=0.955, cost=0.24] \n", + "test minibatch loop: 100%|██████████| 224/224 [00:40<00:00, 5.48it/s, accuracy=1, cost=0.0601] \n", + "train minibatch loop: 0%| | 0/4364 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "tqdm._instances.clear()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "test minibatch loop: 0%| | 0/224 [00:49\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([0, 1]), array([255024, 149263]))" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "np.unique(label, return_counts = True)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [00:07<00:00, 52786.23it/s]\n" - ] - } - ], - "source": [ - "for i in tqdm(range(len(left))):\n", - " left[i] = cleaning(left[i])\n", - " right[i] = cleaning(right[i])\n", - " left[i] = left[i] + ' SEPARATOR ' + right[i]" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "vocab from size: 87662\n", - "Most common words [['SEPARATOR', 4], ('SEPARATOR', 404287), ('the', 377593), ('what', 324635), ('is', 269934), ('i', 223893)]\n", - "Sample data [6, 7, 5, 1286, 63, 1286, 2502, 11, 565, 12] ['what', 'is', 'the', 'step', 'by', 'step', 'guide', 'to', 'invest', 'in']\n" - ] - } - ], - "source": [ - "concat = ' '.join(left).split()\n", - "vocabulary_size = len(list(set(concat)))\n", - "data, count, dictionary, rev_dictionary = build_dataset(concat, vocabulary_size)\n", - "print('vocab from size: %d'%(vocabulary_size))\n", - "print('Most common words', count[4:10])\n", - "print('Sample data', data[:10], [rev_dictionary[i] for i in data[:10]])" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def position_encoding(inputs):\n", - " T = tf.shape(inputs)[1]\n", - " repr_dim = inputs.get_shape()[-1].value\n", - " pos = tf.reshape(tf.range(0.0, tf.to_float(T), dtype=tf.float32), [-1, 1])\n", - " i = np.arange(0, repr_dim, 2, np.float32)\n", - " denom = np.reshape(np.power(10000.0, i / repr_dim), [1, -1])\n", - " enc = tf.expand_dims(tf.concat([tf.sin(pos / denom), tf.cos(pos / denom)], 1), 0)\n", - " return tf.tile(enc, [tf.shape(inputs)[0], 1, 1])\n", - "\n", - "def layer_norm(inputs, epsilon=1e-8):\n", - " mean, variance = tf.nn.moments(inputs, [-1], keep_dims=True)\n", - " normalized = (inputs - mean) / (tf.sqrt(variance + epsilon))\n", - " params_shape = inputs.get_shape()[-1:]\n", - " gamma = tf.get_variable('gamma', params_shape, tf.float32, tf.ones_initializer())\n", - " beta = tf.get_variable('beta', params_shape, tf.float32, tf.zeros_initializer())\n", - " return gamma * normalized + beta\n", - "\n", - "def self_attention(inputs, is_training, num_units, num_heads = 8, activation=None):\n", - " T_q = T_k = tf.shape(inputs)[1]\n", - " Q_K_V = tf.layers.dense(inputs, 3*num_units, activation)\n", - " Q, K, V = tf.split(Q_K_V, 3, -1)\n", - " Q_ = tf.concat(tf.split(Q, num_heads, axis=2), 0)\n", - " K_ = tf.concat(tf.split(K, num_heads, axis=2), 0)\n", - " V_ = tf.concat(tf.split(V, num_heads, axis=2), 0)\n", - " align = tf.matmul(Q_, K_, transpose_b=True)\n", - " align *= tf.rsqrt(tf.to_float(K_.get_shape()[-1].value))\n", - " paddings = tf.fill(tf.shape(align), float('-inf'))\n", - " lower_tri = tf.ones([T_q, T_k])\n", - " lower_tri = tf.linalg.LinearOperatorLowerTriangular(lower_tri).to_dense()\n", - " masks = tf.tile(tf.expand_dims(lower_tri,0), [tf.shape(align)[0],1,1])\n", - " align = tf.where(tf.equal(masks, 0), paddings, align)\n", - " align = tf.nn.softmax(align)\n", - " align = tf.layers.dropout(align, 0.1, training=is_training) \n", - " x = tf.matmul(align, V_)\n", - " x = tf.concat(tf.split(x, num_heads, axis=0), 2)\n", - " x += inputs\n", - " x = layer_norm(x)\n", - " return x\n", - "\n", - "def ffn(inputs, hidden_dim, activation=tf.nn.relu):\n", - " x = tf.layers.conv1d(inputs, 4* hidden_dim, 1, activation=activation) \n", - " x = tf.layers.conv1d(x, hidden_dim, 1, activation=None)\n", - " x += inputs\n", - " x = layer_norm(x)\n", - " return x\n", - "\n", - "class Model:\n", - " def __init__(self, size_layer, num_layers, embedded_size,\n", - " dict_size, learning_rate, dropout, kernel_size = 5):\n", - " \n", - " def cnn(x, scope):\n", - " x += position_encoding(x)\n", - " with tf.variable_scope(scope, reuse = tf.AUTO_REUSE):\n", - " for n in range(num_layers):\n", - " with tf.variable_scope('attn_%d'%i,reuse=tf.AUTO_REUSE):\n", - " x = self_attention(x, True, size_layer)\n", - " with tf.variable_scope('ffn_%d'%i, reuse=tf.AUTO_REUSE):\n", - " x = ffn(x, size_layer)\n", - " \n", - " with tf.variable_scope('logits', reuse=tf.AUTO_REUSE):\n", - " return tf.layers.dense(x, 2)[:, -1]\n", - " \n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None])\n", - " encoder_embeddings = tf.Variable(tf.random_uniform([dict_size, embedded_size], -1, 1))\n", - " embedded_left = tf.nn.embedding_lookup(encoder_embeddings, self.X)\n", - " \n", - " self.logits = cnn(embedded_left, 'left')\n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " logits = self.logits, labels = self.Y\n", - " )\n", - " )\n", - " \n", - " self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(self.cost)\n", - " correct_pred = tf.equal(\n", - " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", - " )\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "size_layer = 128\n", - "num_layers = 4\n", - "embedded_size = 128\n", - "learning_rate = 1e-4\n", - "maxlen = 50\n", - "batch_size = 128\n", - "dropout = 0.8" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.cross_validation import train_test_split\n", - "\n", - "vectors = str_idx(left, dictionary, maxlen)\n", - "train_X, test_X, train_Y, test_Y = train_test_split(vectors, label, test_size = 0.2)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Colocations handled automatically by placer.\n", - "WARNING:tensorflow:From :4: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n", - "WARNING:tensorflow:From :20: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dense instead.\n", - "WARNING:tensorflow:From :33: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dropout instead.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", - "WARNING:tensorflow:From :41: conv1d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.conv1d instead.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n" - ] - } - ], - "source": [ - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model(size_layer,num_layers,embedded_size,len(dictionary),learning_rate,dropout)\n", - "sess.run(tf.global_variables_initializer())" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.20it/s, accuracy=0.663, cost=0.652]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 110.07it/s, accuracy=0.644, cost=0.674]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.61it/s, accuracy=0.648, cost=0.617]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.000000, current acc: 0.654326\n", - "time taken: 60.44020199775696\n", - "epoch: 0, training loss: 0.639404, training acc: 0.640978, valid loss: 0.628099, valid acc: 0.654326\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.62it/s, accuracy=0.663, cost=0.619]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.44it/s, accuracy=0.622, cost=0.669]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.00it/s, accuracy=0.68, cost=0.62] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.654326, current acc: 0.667128\n", - "time taken: 59.827545404434204\n", - "epoch: 0, training loss: 0.621935, training acc: 0.659585, valid loss: 0.614735, valid acc: 0.667128\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.69it/s, accuracy=0.683, cost=0.577]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.01it/s, accuracy=0.6, cost=0.683] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.61it/s, accuracy=0.68, cost=0.621] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.667128, current acc: 0.672164\n", - "time taken: 59.77066659927368\n", - "epoch: 0, training loss: 0.610259, training acc: 0.670584, valid loss: 0.608394, valid acc: 0.672164\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.65it/s, accuracy=0.713, cost=0.564]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 111.70it/s, accuracy=0.656, cost=0.666]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.84it/s, accuracy=0.711, cost=0.604]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.672164, current acc: 0.679227\n", - "time taken: 59.83059549331665\n", - "epoch: 0, training loss: 0.601291, training acc: 0.679090, valid loss: 0.602495, valid acc: 0.679227\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.56it/s, accuracy=0.703, cost=0.556]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.42it/s, accuracy=0.6, cost=0.659] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.75it/s, accuracy=0.695, cost=0.601]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.679227, current acc: 0.685867\n", - "time taken: 59.903602838516235\n", - "epoch: 0, training loss: 0.592938, training acc: 0.687245, valid loss: 0.597082, valid acc: 0.685867\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 46.87it/s, accuracy=0.743, cost=0.548]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 111.96it/s, accuracy=0.633, cost=0.672]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.98it/s, accuracy=0.695, cost=0.585]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.685867, current acc: 0.688751\n", - "time taken: 59.562599897384644\n", - "epoch: 0, training loss: 0.585165, training acc: 0.693349, valid loss: 0.592944, valid acc: 0.688751\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 46.82it/s, accuracy=0.752, cost=0.529]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.44it/s, accuracy=0.622, cost=0.704]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.92it/s, accuracy=0.719, cost=0.585]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.688751, current acc: 0.692926\n", - "time taken: 59.60137748718262\n", - "epoch: 0, training loss: 0.577756, training acc: 0.700359, valid loss: 0.590633, valid acc: 0.692926\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:54<00:00, 46.72it/s, accuracy=0.733, cost=0.524]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.66it/s, accuracy=0.622, cost=0.695]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 46.71it/s, accuracy=0.719, cost=0.597]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.692926, current acc: 0.694126\n", - "time taken: 59.701225996017456\n", - "epoch: 0, training loss: 0.570621, training acc: 0.705953, valid loss: 0.587987, valid acc: 0.694126\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.07it/s, accuracy=0.743, cost=0.517]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.60it/s, accuracy=0.667, cost=0.664]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.16it/s, accuracy=0.75, cost=0.59] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.694126, current acc: 0.697845\n", - "time taken: 59.29985284805298\n", - "epoch: 0, training loss: 0.563849, training acc: 0.711581, valid loss: 0.585073, valid acc: 0.697845\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 46.92it/s, accuracy=0.752, cost=0.49] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.64it/s, accuracy=0.689, cost=0.684]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.25it/s, accuracy=0.734, cost=0.591]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.697845, current acc: 0.699698\n", - "time taken: 59.466017723083496\n", - "epoch: 0, training loss: 0.557104, training acc: 0.716393, valid loss: 0.583814, valid acc: 0.699698\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 46.91it/s, accuracy=0.733, cost=0.527]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.03it/s, accuracy=0.644, cost=0.68] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.28it/s, accuracy=0.75, cost=0.56] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.699698, current acc: 0.700679\n", - "time taken: 59.46453809738159\n", - "epoch: 0, training loss: 0.551015, training acc: 0.721082, valid loss: 0.580544, valid acc: 0.700679\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.04it/s, accuracy=0.762, cost=0.522]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.48it/s, accuracy=0.678, cost=0.651]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.44it/s, accuracy=0.758, cost=0.556]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.700679, current acc: 0.702092\n", - "time taken: 59.29327607154846\n", - "epoch: 0, training loss: 0.545043, training acc: 0.725462, valid loss: 0.581033, valid acc: 0.702092\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.21it/s, accuracy=0.762, cost=0.516]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.11it/s, accuracy=0.7, cost=0.654] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.67it/s, accuracy=0.727, cost=0.55] " - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.702092, current acc: 0.702943\n", - "time taken: 59.11387062072754\n", - "epoch: 0, training loss: 0.539628, training acc: 0.729723, valid loss: 0.581183, valid acc: 0.702943\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.15it/s, accuracy=0.762, cost=0.502]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.97it/s, accuracy=0.633, cost=0.693]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:52, 47.68it/s, accuracy=0.758, cost=0.545]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.702943, current acc: 0.705497\n", - "time taken: 59.19653916358948\n", - "epoch: 0, training loss: 0.533567, training acc: 0.734188, valid loss: 0.578577, valid acc: 0.705497\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.06it/s, accuracy=0.743, cost=0.483]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.83it/s, accuracy=0.644, cost=0.721]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.13it/s, accuracy=0.727, cost=0.544]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.705497, current acc: 0.709658\n", - "time taken: 59.30323553085327\n", - "epoch: 0, training loss: 0.528961, training acc: 0.737278, valid loss: 0.575870, valid acc: 0.709658\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.01it/s, accuracy=0.782, cost=0.481]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.54it/s, accuracy=0.7, cost=0.699] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:52, 47.92it/s, accuracy=0.805, cost=0.487]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 59.32865643501282\n", - "epoch: 0, training loss: 0.522808, training acc: 0.741622, valid loss: 0.579368, valid acc: 0.706827\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.29it/s, accuracy=0.733, cost=0.481]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.10it/s, accuracy=0.622, cost=0.675]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.33it/s, accuracy=0.789, cost=0.505]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 59.023605823516846\n", - "epoch: 0, training loss: 0.517364, training acc: 0.744728, valid loss: 0.578737, valid acc: 0.709103\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.16it/s, accuracy=0.792, cost=0.454]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.06it/s, accuracy=0.567, cost=0.64] \n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:52, 47.79it/s, accuracy=0.789, cost=0.486]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "epoch: 0, pass acc: 0.709658, current acc: 0.711080\n", - "time taken: 59.17823839187622\n", - "epoch: 0, training loss: 0.512706, training acc: 0.748938, valid loss: 0.575415, valid acc: 0.711080\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.10it/s, accuracy=0.782, cost=0.43] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 112.75it/s, accuracy=0.656, cost=0.655]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:54, 46.70it/s, accuracy=0.766, cost=0.531]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 59.26551961898804\n", - "epoch: 0, training loss: 0.507218, training acc: 0.751649, valid loss: 0.579230, valid acc: 0.709997\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.01it/s, accuracy=0.832, cost=0.41] \n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.18it/s, accuracy=0.622, cost=0.669]\n", - "train minibatch loop: 0%| | 5/2527 [00:00<00:53, 47.25it/s, accuracy=0.734, cost=0.526]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 59.346855878829956\n", - "epoch: 0, training loss: 0.502882, training acc: 0.755138, valid loss: 0.583503, valid acc: 0.707928\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 2527/2527 [00:53<00:00, 47.27it/s, accuracy=0.802, cost=0.441]\n", - "test minibatch loop: 100%|██████████| 632/632 [00:05<00:00, 113.33it/s, accuracy=0.622, cost=0.659]" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "time taken: 59.0352988243103\n", - "epoch: 0, training loss: 0.498010, training acc: 0.757788, valid loss: 0.579649, valid acc: 0.709758\n", - "\n", - "break epoch:0\n", - "\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "\n" - ] - } - ], - "source": [ - "import time\n", - "\n", - "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 3, 0, 0, 0\n", - "\n", - "while True:\n", - " lasttime = time.time()\n", - " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", - " print('break epoch:%d\\n' % (EPOCH))\n", - " break\n", - "\n", - " train_acc, train_loss, test_acc, test_loss = 0, 0, 0, 0\n", - " pbar = tqdm(range(0, len(train_X), batch_size), desc='train minibatch loop')\n", - " for i in pbar:\n", - " batch_x = train_X[i:min(i+batch_size,train_X.shape[0])]\n", - " batch_y = train_Y[i:min(i+batch_size,train_X.shape[0])]\n", - " acc, loss, _ = sess.run([model.accuracy, model.cost, model.optimizer], \n", - " feed_dict = {model.X : batch_x,\n", - " model.Y : batch_y})\n", - " assert not np.isnan(loss)\n", - " train_loss += loss\n", - " train_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " pbar = tqdm(range(0, len(test_X), batch_size), desc='test minibatch loop')\n", - " for i in pbar:\n", - " batch_x = test_X[i:min(i+batch_size,test_X.shape[0])]\n", - " batch_y = test_Y[i:min(i+batch_size,test_X.shape[0])]\n", - " acc, loss = sess.run([model.accuracy, model.cost], \n", - " feed_dict = {model.X : batch_x,\n", - " model.Y : batch_y})\n", - " test_loss += loss\n", - " test_acc += acc\n", - " pbar.set_postfix(cost=loss, accuracy = acc)\n", - " \n", - " train_loss /= (len(train_X) / batch_size)\n", - " train_acc /= (len(train_X) / batch_size)\n", - " test_loss /= (len(test_X) / batch_size)\n", - " test_acc /= (len(test_X) / batch_size)\n", - " \n", - " if test_acc > CURRENT_ACC:\n", - " print(\n", - " 'epoch: %d, pass acc: %f, current acc: %f'\n", - " % (EPOCH, CURRENT_ACC, test_acc)\n", - " )\n", - " CURRENT_ACC = test_acc\n", - " CURRENT_CHECKPOINT = 0\n", - " else:\n", - " CURRENT_CHECKPOINT += 1\n", - " \n", - " print('time taken:', time.time()-lasttime)\n", - " print('epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'%(EPOCH,train_loss,\n", - " train_acc,test_loss,\n", - " test_acc))" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/text-similarity/6.bert-base-circle-loss.ipynb b/text-similarity/6.bert-base-circle-loss.ipynb new file mode 100644 index 0000000..35bc2be --- /dev/null +++ b/text-similarity/6.bert-base-circle-loss.ipynb @@ -0,0 +1,563 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '2'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:87: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import numpy as np\n", + "import tensorflow as tf\n", + "import pandas as pd\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'\n", + "\n", + "tokenization.validate_case_matches_checkpoint(True, '')\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file=BERT_VOCAB, do_lower_case=False)\n", + "MAX_SEQ_LENGTH = 120" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + "\n", + "def get_data(left, right):\n", + " input_ids, input_masks, segment_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens_a = tokenizer.tokenize(left[i])\n", + " tokens_b = tokenizer.tokenize(right[i])\n", + " _truncate_seq_pair(tokens_a, tokens_b, MAX_SEQ_LENGTH - 3)\n", + " tokens = []\n", + " segment_id = []\n", + " tokens.append(\"[CLS]\")\n", + " segment_id.append(0)\n", + " for token in tokens_a:\n", + " tokens.append(token)\n", + " segment_id.append(0)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(0)\n", + " for token in tokens_b:\n", + " tokens.append(token)\n", + " segment_id.append(1)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(1)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " while len(input_id) < MAX_SEQ_LENGTH:\n", + " input_id.append(0)\n", + " input_mask.append(0)\n", + " segment_id.append(0)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " return input_ids, input_masks, segment_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [02:41<00:00, 1623.32it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:08<00:00, 1575.46it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "epoch = 10\n", + "batch_size = 60\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(left) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " \n", + " model = modeling.BertModel(\n", + " config=bert_config,\n", + " is_training=True,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " self.out = tf.layers.dense(output_layer, bert_config.hidden_size)\n", + " self.out = tf.nn.l2_normalize(self.out, 1)\n", + " self.logits = tf.layers.dense(self.out,dimension_output,use_bias=False,\n", + " kernel_constraint=tf.keras.constraints.unit_norm())\n", + " \n", + " self.gamma = 64\n", + " self.margin = 0.25\n", + " self.O_p = 1 + self.margin\n", + " self.O_n = -self.margin\n", + " self.Delta_p = 1 - self.margin\n", + " self.Delta_n = self.margin\n", + " \n", + " self.batch_idxs = tf.expand_dims(\n", + " tf.range(0, self.batch_size, dtype=tf.int32), 1) # shape [batch,1]\n", + " idxs = tf.concat([self.batch_idxs, tf.cast(self.Y, tf.int32)], 1)\n", + " sp = tf.expand_dims(tf.gather_nd(self.logits, idxs), 1)\n", + " mask = tf.logical_not(\n", + " tf.scatter_nd(idxs, tf.ones(tf.shape(idxs)[0], tf.bool),\n", + " tf.shape(self.logits)))\n", + "\n", + " sn = tf.reshape(tf.boolean_mask(self.logits, mask), (self.batch_size, -1))\n", + "\n", + " alpha_p = tf.nn.relu(self.O_p - tf.stop_gradient(sp))\n", + " alpha_n = tf.nn.relu(tf.stop_gradient(sn) - self.O_n)\n", + "\n", + " r_sp_m = alpha_p * (sp - self.Delta_p)\n", + " r_sn_m = alpha_n * (sn - self.Delta_n)\n", + " _Z = tf.concat([r_sn_m, r_sp_m], 1)\n", + " _Z = _Z * self.gamma\n", + " # sum all similarity\n", + " logZ = tf.math.reduce_logsumexp(_Z, 1, keepdims=True)\n", + " # remove sn_p from all sum similarity\n", + " self.cost = -r_sp_m * self.gamma + logZ\n", + " self.cost = tf.reduce_mean(self.cost[:,0])\n", + " \n", + " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", + " num_train_steps, num_warmup_steps, False)\n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y[:,0]\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py:1475: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:27: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:32: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/bert/optimization.py:70: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.\n", + "\n", + "INFO:tensorflow:Restoring parameters from cased_L-12_H-768_A-12/bert_model.ckpt\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "learning_rate = 2e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + " learning_rate\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())\n", + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, BERT_INIT_CHKPNT)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 91%|█████████▏| 3992/4364 [34:19<03:10, 1.95it/s, accuracy=0.883, cost=6.12]IOPub message rate exceeded.\n", + "The notebook server will temporarily stop sending output\n", + "to the client in order to avoid crashing it.\n", + "To change this limit, set the config variable\n", + "`--NotebookApp.iopub_msg_rate_limit`.\n", + "\n", + "Current values:\n", + "NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)\n", + "NotebookApp.rate_limit_window=3.0 (secs)\n", + "\n", + "train minibatch loop: 100%|██████████| 4364/4364 [37:26<00:00, 1.94it/s, accuracy=0.955, cost=3.35]\n", + "test minibatch loop: 100%|██████████| 224/224 [00:40<00:00, 5.52it/s, accuracy=1, cost=0.312] " + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "time taken: 2287.008446931839\n", + "epoch: 1, training loss: 5.740891, training acc: 0.911401, valid loss: 6.376333, valid acc: 0.899777\n", + "\n", + "break epoch:2\n", + "\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "import time\n", + "\n", + "EARLY_STOPPING, CURRENT_CHECKPOINT, CURRENT_ACC, EPOCH = 1, 0, 0, 0\n", + "\n", + "while True:\n", + " lasttime = time.time()\n", + " if CURRENT_CHECKPOINT == EARLY_STOPPING:\n", + " print('break epoch:%d\\n' % (EPOCH))\n", + " break\n", + "\n", + " train_acc, train_loss, test_acc, test_loss = [], [], [], []\n", + " pbar = tqdm(\n", + " range(0, len(train_ids), batch_size), desc = 'train minibatch loop'\n", + " )\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(train_ids))\n", + " batch_x = train_ids[i: index]\n", + " batch_masks = train_masks[i: index]\n", + " batch_segment = segment_train[i: index]\n", + " batch_y = train_Y[i: index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " acc, cost, _ = sess.run(\n", + " [model.accuracy, model.cost, model.optimizer],\n", + " feed_dict = {\n", + " model.Y: batch_y,\n", + " model.X: batch_x,\n", + " model.segment_ids: batch_segment,\n", + " model.input_masks: batch_masks\n", + " },\n", + " )\n", + " assert not np.isnan(cost)\n", + " train_loss.append(cost)\n", + " train_acc.append(acc)\n", + " pbar.set_postfix(cost = cost, accuracy = acc)\n", + " \n", + " pbar = tqdm(range(0, len(test_ids), batch_size), desc = 'test minibatch loop')\n", + " for i in pbar:\n", + " index = min(i + batch_size, len(test_ids))\n", + " batch_x = test_ids[i: index]\n", + " batch_masks = test_masks[i: index]\n", + " batch_segment = segment_test[i: index]\n", + " batch_y = test_Y[i: index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " acc, cost = sess.run(\n", + " [model.accuracy, model.cost],\n", + " feed_dict = {\n", + " model.Y: batch_y,\n", + " model.X: batch_x,\n", + " model.segment_ids: batch_segment,\n", + " model.input_masks: batch_masks\n", + " },\n", + " )\n", + " test_loss.append(cost)\n", + " test_acc.append(acc)\n", + " pbar.set_postfix(cost = cost, accuracy = acc)\n", + "\n", + " train_loss = np.mean(train_loss)\n", + " train_acc = np.mean(train_acc)\n", + " test_loss = np.mean(test_loss)\n", + " test_acc = np.mean(test_acc)\n", + " \n", + " if test_acc > CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "test minibatch loop: 100%|██████████| 224/224 [00:40<00:00, 5.51it/s, accuracy=1, cost=0.3] \n" + ] + } + ], + "source": [ + "test_acc, test_loss = [], []\n", + "\n", + "pbar = tqdm(range(0, len(test_ids), batch_size), desc = 'test minibatch loop')\n", + "for i in pbar:\n", + " index = min(i + batch_size, len(test_ids))\n", + " batch_x = test_ids[i: index]\n", + " batch_masks = test_masks[i: index]\n", + " batch_segment = segment_test[i: index]\n", + " batch_y = test_Y[i: index]\n", + " batch_y = np.expand_dims(batch_y,1)\n", + " acc, cost = sess.run(\n", + " [model.accuracy, model.cost],\n", + " feed_dict = {\n", + " model.Y: batch_y,\n", + " model.X: batch_x,\n", + " model.segment_ids: batch_segment,\n", + " model.input_masks: batch_masks\n", + " },\n", + " )\n", + " test_loss.append(cost)\n", + " test_acc.append(acc)\n", + " pbar.set_postfix(cost = cost, accuracy = acc)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(6.366118, 0.8990327)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "test_loss = np.mean(test_loss)\n", + "test_acc = np.mean(test_acc)\n", + "test_loss, test_acc" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/6.bert.ipynb b/text-similarity/6.bert.ipynb deleted file mode 100644 index 7032275..0000000 --- a/text-similarity/6.bert.ipynb +++ /dev/null @@ -1,621 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv\n", - "# !pip3 install bert-tensorflow --user\n", - "# !wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip\n", - "# !unzip uncased_L-12_H-768_A-12.zip" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import bert\n", - "from bert import run_classifier\n", - "from bert import optimization\n", - "from bert import tokenization\n", - "from bert import modeling\n", - "import numpy as np\n", - "import tensorflow as tf\n", - "import pandas as pd\n", - "from tqdm import tqdm" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "BERT_VOCAB = 'uncased_L-12_H-768_A-12/vocab.txt'\n", - "BERT_INIT_CHKPNT = 'uncased_L-12_H-768_A-12/bert_model.ckpt'\n", - "BERT_CONFIG = 'uncased_L-12_H-768_A-12/bert_config.json'\n", - "\n", - "tokenization.validate_case_matches_checkpoint(True, '')\n", - "tokenizer = tokenization.FullTokenizer(\n", - " vocab_file=BERT_VOCAB, do_lower_case=True)\n", - "MAX_SEQ_LENGTH = 100" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idqid1qid2question1question2is_duplicate
0012What is the step by step guide to invest in sh...What is the step by step guide to invest in sh...0
1134What is the story of Kohinoor (Koh-i-Noor) Dia...What would happen if the Indian government sto...0
2256How can I increase the speed of my internet co...How can Internet speed be increased by hacking...0
3378Why am I mentally very lonely? How can I solve...Find the remainder when [math]23^{24}[/math] i...0
44910Which one dissolve in water quikly sugar, salt...Which fish would survive in salt water?0
\n", - "
" - ], - "text/plain": [ - " id qid1 qid2 question1 \\\n", - "0 0 1 2 What is the step by step guide to invest in sh... \n", - "1 1 3 4 What is the story of Kohinoor (Koh-i-Noor) Dia... \n", - "2 2 5 6 How can I increase the speed of my internet co... \n", - "3 3 7 8 Why am I mentally very lonely? How can I solve... \n", - "4 4 9 10 Which one dissolve in water quikly sugar, salt... \n", - "\n", - " question2 is_duplicate \n", - "0 What is the step by step guide to invest in sh... 0 \n", - "1 What would happen if the Indian government sto... 0 \n", - "2 How can Internet speed be increased by hacking... 0 \n", - "3 Find the remainder when [math]23^{24}[/math] i... 0 \n", - "4 Which fish would survive in salt water? 0 " - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df = pd.read_csv('quora_duplicate_questions.tsv', delimiter='\\t').dropna()\n", - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "left, right, label = df['question1'].tolist(), df['question2'].tolist(), df['is_duplicate'].tolist()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|██████████| 404287/404287 [02:58<00:00, 2262.11it/s]\n" - ] - } - ], - "source": [ - "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", - " while True:\n", - " total_length = len(tokens_a) + len(tokens_b)\n", - " if total_length <= max_length:\n", - " break\n", - " if len(tokens_a) > len(tokens_b):\n", - " tokens_a.pop()\n", - " else:\n", - " tokens_b.pop()\n", - "\n", - "input_ids, input_masks, segment_ids = [], [], []\n", - "\n", - "for i in tqdm(range(len(left))):\n", - " tokens_a = tokenizer.tokenize(left[i])\n", - " tokens_b = tokenizer.tokenize(right[i])\n", - " _truncate_seq_pair(tokens_a, tokens_b, MAX_SEQ_LENGTH - 3)\n", - " \n", - " tokens = []\n", - " segment_id = []\n", - " tokens.append(\"[CLS]\")\n", - " segment_id.append(0)\n", - " for token in tokens_a:\n", - " tokens.append(token)\n", - " segment_id.append(0)\n", - " tokens.append(\"[SEP]\")\n", - " segment_id.append(0)\n", - " for token in tokens_b:\n", - " tokens.append(token)\n", - " segment_id.append(1)\n", - " tokens.append(\"[SEP]\")\n", - " segment_id.append(1)\n", - " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", - " input_mask = [1] * len(input_id)\n", - " \n", - " while len(input_id) < MAX_SEQ_LENGTH:\n", - " input_id.append(0)\n", - " input_mask.append(0)\n", - " segment_id.append(0)\n", - " \n", - " input_ids.append(input_id)\n", - " input_masks.append(input_mask)\n", - " segment_ids.append(segment_id)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "epoch = 10\n", - "batch_size = 60\n", - "warmup_proportion = 0.1\n", - "num_train_steps = int(len(left) / batch_size * epoch)\n", - "num_warmup_steps = int(num_train_steps * warmup_proportion)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "class Model:\n", - " def __init__(\n", - " self,\n", - " dimension_output,\n", - " learning_rate = 2e-5,\n", - " ):\n", - " self.X = tf.placeholder(tf.int32, [None, None])\n", - " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", - " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", - " self.Y = tf.placeholder(tf.int32, [None])\n", - " \n", - " model = modeling.BertModel(\n", - " config=bert_config,\n", - " is_training=True,\n", - " input_ids=self.X,\n", - " input_mask=self.input_masks,\n", - " token_type_ids=self.segment_ids,\n", - " use_one_hot_embeddings=False)\n", - " \n", - " output_layer = model.get_pooled_output()\n", - " self.logits = tf.layers.dense(output_layer, dimension_output)\n", - " self.logits = tf.identity(self.logits, name = 'logits')\n", - " \n", - " self.cost = tf.reduce_mean(\n", - " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", - " logits = self.logits, labels = self.Y\n", - " )\n", - " )\n", - " \n", - " self.optimizer = optimization.create_optimizer(self.cost, learning_rate, \n", - " num_train_steps, num_warmup_steps, False)\n", - " correct_pred = tf.equal(\n", - " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", - " )\n", - " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Colocations handled automatically by placer.\n", - "\n", - "WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", - "For more information, please see:\n", - " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", - " * https://github.com/tensorflow/addons\n", - "If you depend on functionality not listed there, please file an issue.\n", - "\n", - "WARNING:tensorflow:From /home/jupyter/.local/lib/python3.6/site-packages/bert/modeling.py:358: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.\n", - "WARNING:tensorflow:From /home/jupyter/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use keras.layers.dense instead.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Deprecated in favor of operator or tf.math.divide.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use tf.cast instead.\n", - "WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", - "Instructions for updating:\n", - "Use standard file APIs to check for files with this prefix.\n", - "INFO:tensorflow:Restoring parameters from uncased_L-12_H-768_A-12/bert_model.ckpt\n" - ] - } - ], - "source": [ - "dimension_output = 2\n", - "learning_rate = 1e-5\n", - "\n", - "tf.reset_default_graph()\n", - "sess = tf.InteractiveSession()\n", - "model = Model(\n", - " dimension_output,\n", - " learning_rate\n", - ")\n", - "\n", - "sess.run(tf.global_variables_initializer())\n", - "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert')\n", - "saver = tf.train.Saver(var_list = var_lists)\n", - "saver.restore(sess, BERT_INIT_CHKPNT)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "train_input_ids, test_input_ids, train_input_masks, test_input_masks, train_segment_ids, test_segment_ids, train_Y, test_Y = train_test_split(\n", - " input_ids, input_masks, segment_ids, label, test_size = 0.2\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "train minibatch loop: 100%|██████████| 5391/5391 [35:52<00:00, 2.88it/s, accuracy=0.966, cost=0.205]\n", - "test minibatch loop: 100%|██████████| 1348/1348 [03:06<00:00, 7.23it/s, accuracy=0.868, cost=0.271]\n", - "train minibatch loop: 0%| | 0/5391 [00:00 CURRENT_ACC:\n", - " print(\n", - " 'epoch: %d, pass acc: %f, current acc: %f'\n", - " % (EPOCH, CURRENT_ACC, test_acc)\n", - " )\n", - " CURRENT_ACC = test_acc\n", - " CURRENT_CHECKPOINT = 0\n", - " else:\n", - " CURRENT_CHECKPOINT += 1\n", - " \n", - " print('time taken:', time.time() - lasttime)\n", - " print(\n", - " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", - " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", - " )\n", - " EPOCH += 1" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/text-similarity/7.electra-base-cross-entropy.ipynb b/text-similarity/7.electra-base-cross-entropy.ipynb new file mode 100644 index 0000000..1b2d11e --- /dev/null +++ b/text-similarity/7.electra-base-cross-entropy.ipynb @@ -0,0 +1,570 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Make sure run this notebook in electra repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/google-research/electra.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2020-07-19 19:22:35-- https://storage.googleapis.com/electra-data/electra_base.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.130.128, 172.217.194.128, 2404:6800:4003:c00::80, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.130.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 885890161 (845M) [application/zip]\n", + "Saving to: ‘electra_base.zip’\n", + "\n", + "electra_base.zip 100%[===================>] 844.85M 22.2MB/s in 48s \n", + "\n", + "2020-07-19 19:23:24 (17.5 MB/s) - ‘electra_base.zip’ saved [885890161/885890161]\n", + "\n", + "Archive: electra_base.zip\n", + " creating: electra_base/\n", + " inflating: electra_base/electra_base.meta \n", + " inflating: electra_base/electra_base.index \n", + " inflating: electra_base/checkpoint \n", + " inflating: electra_base/vocab.txt \n", + " inflating: electra_base/electra_base.data-00000-of-00001 \n", + "checkpoint\t\t\t electra_base.index vocab.txt\n", + "electra_base.data-00000-of-00001 electra_base.meta\n" + ] + } + ], + "source": [ + "!wget https://storage.googleapis.com/electra-data/electra_base.zip\n", + "!unzip electra_base.zip\n", + "!ls electra_base/" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'vocab_size': 30522,\n", + " 'hidden_size': 768,\n", + " 'num_hidden_layers': 12,\n", + " 'num_attention_heads': 12,\n", + " 'hidden_act': 'gelu',\n", + " 'intermediate_size': 3072,\n", + " 'hidden_dropout_prob': 0.1,\n", + " 'attention_probs_dropout_prob': 0.1,\n", + " 'max_position_embeddings': 512,\n", + " 'type_vocab_size': 2,\n", + " 'initializer_range': 0.02}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import configure_finetuning\n", + "from util import training_utils\n", + "\n", + "hparams = {'model_size': 'base', 'vocab_size': 30522}\n", + "config = configure_finetuning.FinetuningConfig('electra-base', './electra_base/', **hparams)\n", + "bert_config = training_utils.get_bert_config(config)\n", + "\n", + "bert_config.__dict__" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from model import modeling\n", + "from model import optimization" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SEQ_LENGTH = 120" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from model import tokenization\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file='electra_base/vocab.txt',\n", + " do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('../text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "from tqdm import tqdm\n", + "\n", + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + "\n", + "def get_data(left, right):\n", + " input_ids, input_masks, segment_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens_a = tokenizer.tokenize(left[i])\n", + " tokens_b = tokenizer.tokenize(right[i])\n", + " _truncate_seq_pair(tokens_a, tokens_b, MAX_SEQ_LENGTH - 3)\n", + " tokens = []\n", + " segment_id = []\n", + " tokens.append(\"[CLS]\")\n", + " segment_id.append(0)\n", + " for token in tokens_a:\n", + " tokens.append(token)\n", + " segment_id.append(0)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(0)\n", + " for token in tokens_b:\n", + " tokens.append(token)\n", + " segment_id.append(1)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(1)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " while len(input_id) < MAX_SEQ_LENGTH:\n", + " input_id.append(0)\n", + " input_mask.append(0)\n", + " segment_id.append(0)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " return input_ids, input_masks, segment_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [03:03<00:00, 1425.14it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:09<00:00, 1446.81it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from finetune import task_builder" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'chunk'" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tasks = task_builder.get_tasks(config)\n", + "tasks[0].name" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 60\n", + "epoch = 10\n", + "num_train_steps = int(len(train_ids) / batch_size * epoch)\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " model = modeling.BertModel(\n", + " bert_config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " \n", + " with tf.variable_scope(\"task_specific/classify\"):\n", + " self.logits = tf.layers.dense(output_layer, dimension_output)\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " self.optimizer = optimization.create_optimizer(\n", + " self.cost, config.learning_rate, num_train_steps,\n", + " weight_decay_rate=config.weight_decay_rate,\n", + " use_tpu=config.use_tpu,\n", + " warmup_proportion=config.warmup_proportion,\n", + " layerwise_lr_decay_power=config.layerwise_lr_decay,\n", + " n_transformer_layers=bert_config.num_hidden_layers\n", + " )\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/text-similarity/electra/model/modeling.py:698: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:187: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/ops/clip_ops.py:301: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from electra_base/electra_base\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'electra')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, 'electra_base/electra_base')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 0%| | 0/4364 [00:30 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/8.electra-base-circle-loss.ipynb b/text-similarity/8.electra-base-circle-loss.ipynb new file mode 100644 index 0000000..79286f2 --- /dev/null +++ b/text-similarity/8.electra-base-circle-loss.ipynb @@ -0,0 +1,558 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Make sure run this notebook in electra repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/google-research/electra.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/electra-data/electra_base.zip\n", + "# !unzip electra_base.zip\n", + "# !ls electra_base/" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'vocab_size': 30522,\n", + " 'hidden_size': 768,\n", + " 'num_hidden_layers': 12,\n", + " 'num_attention_heads': 12,\n", + " 'hidden_act': 'gelu',\n", + " 'intermediate_size': 3072,\n", + " 'hidden_dropout_prob': 0.1,\n", + " 'attention_probs_dropout_prob': 0.1,\n", + " 'max_position_embeddings': 512,\n", + " 'type_vocab_size': 2,\n", + " 'initializer_range': 0.02}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import configure_finetuning\n", + "from util import training_utils\n", + "\n", + "hparams = {'model_size': 'base', 'vocab_size': 30522}\n", + "config = configure_finetuning.FinetuningConfig('electra-base', './electra_base/', **hparams)\n", + "bert_config = training_utils.get_bert_config(config)\n", + "\n", + "bert_config.__dict__" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import tensorflow as tf\n", + "from model import modeling\n", + "from model import optimization" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "MAX_SEQ_LENGTH = 120" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from model import tokenization\n", + "\n", + "tokenizer = tokenization.FullTokenizer(\n", + " vocab_file='electra_base/vocab.txt',\n", + " do_lower_case=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('../text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "from tqdm import tqdm\n", + "\n", + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + "\n", + "def get_data(left, right):\n", + " input_ids, input_masks, segment_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens_a = tokenizer.tokenize(left[i])\n", + " tokens_b = tokenizer.tokenize(right[i])\n", + " _truncate_seq_pair(tokens_a, tokens_b, MAX_SEQ_LENGTH - 3)\n", + " tokens = []\n", + " segment_id = []\n", + " tokens.append(\"[CLS]\")\n", + " segment_id.append(0)\n", + " for token in tokens_a:\n", + " tokens.append(token)\n", + " segment_id.append(0)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(0)\n", + " for token in tokens_b:\n", + " tokens.append(token)\n", + " segment_id.append(1)\n", + " tokens.append(\"[SEP]\")\n", + " segment_id.append(1)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " while len(input_id) < MAX_SEQ_LENGTH:\n", + " input_id.append(0)\n", + " input_mask.append(0)\n", + " segment_id.append(0)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " return input_ids, input_masks, segment_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [02:42<00:00, 1608.50it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:08<00:00, 1669.02it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "from finetune import task_builder" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'chunk'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tasks = task_builder.get_tasks(config)\n", + "tasks[0].name" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 60\n", + "epoch = 10\n", + "num_train_steps = int(len(train_ids) / batch_size * epoch)\n", + "\n", + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.int32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None, None])\n", + " self.batch_size = tf.shape(self.X)[0]\n", + " \n", + " model = modeling.BertModel(\n", + " bert_config=bert_config,\n", + " is_training=False,\n", + " input_ids=self.X,\n", + " input_mask=self.input_masks,\n", + " token_type_ids=self.segment_ids,\n", + " use_one_hot_embeddings=False)\n", + " \n", + " output_layer = model.get_pooled_output()\n", + " \n", + " with tf.variable_scope(\"task_specific/classify\"):\n", + " self.out = tf.layers.dense(output_layer, bert_config.hidden_size)\n", + " self.out = tf.nn.l2_normalize(self.out, 1)\n", + " self.logits = tf.layers.dense(self.out,dimension_output,use_bias=False,\n", + " kernel_constraint=tf.keras.constraints.unit_norm())\n", + " \n", + " self.gamma = 64\n", + " self.margin = 0.25\n", + " self.O_p = 1 + self.margin\n", + " self.O_n = -self.margin\n", + " self.Delta_p = 1 - self.margin\n", + " self.Delta_n = self.margin\n", + " \n", + " self.batch_idxs = tf.expand_dims(\n", + " tf.range(0, self.batch_size, dtype=tf.int32), 1) # shape [batch,1]\n", + " idxs = tf.concat([self.batch_idxs, tf.cast(self.Y, tf.int32)], 1)\n", + " sp = tf.expand_dims(tf.gather_nd(self.logits, idxs), 1)\n", + " mask = tf.logical_not(\n", + " tf.scatter_nd(idxs, tf.ones(tf.shape(idxs)[0], tf.bool),\n", + " tf.shape(self.logits)))\n", + "\n", + " sn = tf.reshape(tf.boolean_mask(self.logits, mask), (self.batch_size, -1))\n", + "\n", + " alpha_p = tf.nn.relu(self.O_p - tf.stop_gradient(sp))\n", + " alpha_n = tf.nn.relu(tf.stop_gradient(sn) - self.O_n)\n", + "\n", + " r_sp_m = alpha_p * (sp - self.Delta_p)\n", + " r_sn_m = alpha_n * (sn - self.Delta_n)\n", + " _Z = tf.concat([r_sn_m, r_sp_m], 1)\n", + " _Z = _Z * self.gamma\n", + " # sum all similarity\n", + " logZ = tf.math.reduce_logsumexp(_Z, 1, keepdims=True)\n", + " # remove sn_p from all sum similarity\n", + " self.cost = -r_sp_m * self.gamma + logZ\n", + " self.cost = tf.reduce_mean(self.cost[:,0])\n", + " \n", + " self.optimizer = optimization.create_optimizer(\n", + " self.cost, config.learning_rate, num_train_steps,\n", + " weight_decay_rate=config.weight_decay_rate,\n", + " use_tpu=config.use_tpu,\n", + " warmup_proportion=config.warmup_proportion,\n", + " layerwise_lr_decay_power=config.layerwise_lr_decay,\n", + " n_transformer_layers=bert_config.num_hidden_layers\n", + " )\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y[:,0]\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py:1750: UserWarning: An interactive session is already active. This can cause out-of-memory errors in some cases. You must explicitly call `InteractiveSession.close()` to release resources held by the other session(s).\n", + " warnings.warn('An interactive session is already active. This can '\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(\n", + " dimension_output,\n", + ")\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from electra_base/electra_base\n" + ] + } + ], + "source": [ + "var_lists = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'electra')\n", + "saver = tf.train.Saver(var_list = var_lists)\n", + "saver.restore(sess, 'electra_base/electra_base')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 4364/4364 [34:51<00:00, 2.09it/s, accuracy=0.909, cost=6.76] \n", + "test minibatch loop: 100%|██████████| 224/224 [00:35<00:00, 6.37it/s, accuracy=1, cost=0.0269] \n", + "train minibatch loop: 0%| | 0/4364 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/9.xlnet-base-cross-entropy.ipynb b/text-similarity/9.xlnet-base-cross-entropy.ipynb new file mode 100644 index 0000000..be25d00 --- /dev/null +++ b/text-similarity/9.xlnet-base-cross-entropy.ipynb @@ -0,0 +1,752 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Make sure run this notebook in xlnet repo folder after git clone,\n", + "\n", + "```bash\n", + "git clone https://github.com/zihangdai/xlnet.git\n", + "cd electra\n", + "jupyter notebook\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip -O xlnet.zip\n", + "# !unzip xlnet.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '1'" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import sentencepiece as spm\n", + "from prepro_utils import preprocess_text, encode_ids\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "\n", + "def tokenize_fn(text):\n", + " text = preprocess_text(text, lower= False)\n", + " return encode_ids(sp_model, text)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " \"\" : 0,\n", + " \"\" : 1,\n", + " \"\" : 2,\n", + " \"\" : 3,\n", + " \"\" : 4,\n", + " \"\" : 5,\n", + " \"\" : 6,\n", + " \"\" : 7,\n", + " \"\" : 8,\n", + "}\n", + "\n", + "VOCAB_SIZE = 32000\n", + "UNK_ID = special_symbols[\"\"]\n", + "CLS_ID = special_symbols[\"\"]\n", + "SEP_ID = special_symbols[\"\"]\n", + "MASK_ID = special_symbols[\"\"]\n", + "EOD_ID = special_symbols[\"\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['train_X', 'train_Y', 'test_X', 'test_Y'])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "\n", + "with open('../text.json') as fopen:\n", + " data = json.load(fopen)\n", + " \n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from tqdm import tqdm\n", + "\n", + "MAX_SEQ_LENGTH = 120\n", + "\n", + "def _truncate_seq_pair(tokens_a, tokens_b, max_length):\n", + " while True:\n", + " total_length = len(tokens_a) + len(tokens_b)\n", + " if total_length <= max_length:\n", + " break\n", + " if len(tokens_a) > len(tokens_b):\n", + " tokens_a.pop()\n", + " else:\n", + " tokens_b.pop()\n", + " \n", + "def get_data(left, right):\n", + " input_ids, input_mask, all_seg_ids = [], [], []\n", + " for i in tqdm(range(len(left))):\n", + " tokens = tokenize_fn(left[i])\n", + " tokens_right = tokenize_fn(right[i])\n", + " \n", + " _truncate_seq_pair(tokens, tokens_right, MAX_SEQ_LENGTH - 3)\n", + " \n", + " segment_ids = [SEG_ID_A] * len(tokens)\n", + " tokens.append(SEP_ID)\n", + " segment_ids.append(SEG_ID_A)\n", + "\n", + " tokens.extend(tokens_right)\n", + " segment_ids.extend([SEG_ID_B] * len(tokens_right))\n", + " tokens.append(SEP_ID)\n", + " segment_ids.append(SEG_ID_B)\n", + "\n", + " tokens.append(CLS_ID)\n", + " segment_ids.append(SEG_ID_CLS)\n", + "\n", + " cur_input_ids = tokens\n", + " cur_input_mask = [0] * len(cur_input_ids)\n", + " assert len(tokens) == len(cur_input_mask)\n", + " assert len(tokens) == len(segment_ids)\n", + " input_ids.append(tokens)\n", + " input_mask.append(cur_input_mask)\n", + " all_seg_ids.append(segment_ids)\n", + " return input_ids, input_mask, all_seg_ids" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [01:12<00:00, 3616.45it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['train_X'])):\n", + " l, r = data['train_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "train_ids, train_masks, segment_train = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:04<00:00, 2813.06it/s]\n" + ] + } + ], + "source": [ + "left, right = [], []\n", + "for i in range(len(data['test_X'])):\n", + " l, r = data['test_X'][i].split(' <> ')\n", + " left.append(l)\n", + " right.append(r)\n", + " \n", + "test_ids, test_masks, segment_test = get_data(left, right)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:295: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:63: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n" + ] + } + ], + "source": [ + "import xlnet\n", + "import model_utils\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "\n", + "kwargs = dict(\n", + " is_training=True,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.1,\n", + " dropatt=0.1,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.05,\n", + " clamp_len=-1)\n", + "\n", + "xlnet_parameters = xlnet.RunConfig(**kwargs)\n", + "xlnet_config = xlnet.XLNetConfig(json_path='xlnet_cased_L-12_H-768_A-12/xlnet_config.json')" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "122719 12271\n" + ] + } + ], + "source": [ + "epoch = 15\n", + "batch_size = 32\n", + "warmup_proportion = 0.1\n", + "num_train_steps = int(len(train_ids) / batch_size * epoch)\n", + "num_warmup_steps = int(num_train_steps * warmup_proportion)\n", + "print(num_train_steps, num_warmup_steps)\n", + "\n", + "training_parameters = dict(\n", + " decay_method = 'poly',\n", + " train_steps = num_train_steps,\n", + " learning_rate = 2e-5,\n", + " warmup_steps = num_warmup_steps,\n", + " min_lr_ratio = 0.0,\n", + " weight_decay = 0.00,\n", + " adam_epsilon = 1e-8,\n", + " num_core_per_host = 1,\n", + " lr_layer_decay_rate = 1,\n", + " use_tpu=False,\n", + " use_bfloat16=False,\n", + " dropout=0.0,\n", + " dropatt=0.0,\n", + " init='normal',\n", + " init_range=0.1,\n", + " init_std=0.02,\n", + " clip = 1.0,\n", + " clamp_len=-1,)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "class Parameter:\n", + " def __init__(self, decay_method, warmup_steps, weight_decay, adam_epsilon, \n", + " num_core_per_host, lr_layer_decay_rate, use_tpu, learning_rate, train_steps,\n", + " min_lr_ratio, clip, **kwargs):\n", + " self.decay_method = decay_method\n", + " self.warmup_steps = warmup_steps\n", + " self.weight_decay = weight_decay\n", + " self.adam_epsilon = adam_epsilon\n", + " self.num_core_per_host = num_core_per_host\n", + " self.lr_layer_decay_rate = lr_layer_decay_rate\n", + " self.use_tpu = use_tpu\n", + " self.learning_rate = learning_rate\n", + " self.train_steps = train_steps\n", + " self.min_lr_ratio = min_lr_ratio\n", + " self.clip = clip\n", + " \n", + "training_parameters = Parameter(**training_parameters)" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "class Model:\n", + " def __init__(\n", + " self,\n", + " dimension_output,\n", + " learning_rate = 2e-5,\n", + " ):\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + " self.Y = tf.placeholder(tf.int32, [None])\n", + " \n", + " xlnet_model = xlnet.XLNetModel(\n", + " xlnet_config=xlnet_config,\n", + " run_config=xlnet_parameters,\n", + " input_ids=tf.transpose(self.X, [1, 0]),\n", + " seg_ids=tf.transpose(self.segment_ids, [1, 0]),\n", + " input_mask=tf.transpose(self.input_masks, [1, 0]))\n", + " \n", + " summary = xlnet_model.get_pooled_out(\"last\", True)\n", + " print(summary)\n", + " \n", + " self.logits = tf.layers.dense(summary, dimension_output)\n", + " \n", + " self.cost = tf.reduce_mean(\n", + " tf.nn.sparse_softmax_cross_entropy_with_logits(\n", + " logits = self.logits, labels = self.Y\n", + " )\n", + " )\n", + " self.optimizer, self.learning_rate, _ = model_utils.get_train_op(training_parameters, self.cost)\n", + " \n", + " correct_pred = tf.equal(\n", + " tf.argmax(self.logits, 1, output_type = tf.int32), self.Y\n", + " )\n", + " self.accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:220: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/xlnet.py:220: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:453: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "INFO:tensorflow:memory input None\n", + "INFO:tensorflow:Use float type \n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:460: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:535: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "WARNING:tensorflow:From /home/husein/.local/lib/python3.6/site-packages/tensorflow_core/python/layers/core.py:271: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Please use `layer.__call__` method instead.\n", + "WARNING:tensorflow:\n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.Dense instead.\n", + "Tensor(\"model_1/sequnece_summary/dropout/dropout/mul_1:0\", shape=(?, 768), dtype=float32)\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:96: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:108: The name tf.train.polynomial_decay is deprecated. Please use tf.compat.v1.train.polynomial_decay instead.\n", + "\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:123: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", + "WARNING:tensorflow:From /home/husein/text-similarity/xlnet/model_utils.py:131: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.\n", + "\n" + ] + } + ], + "source": [ + "dimension_output = 2\n", + "learning_rate = 2e-5\n", + "\n", + "tf.reset_default_graph()\n", + "sess = tf.InteractiveSession()\n", + "model = Model(dimension_output, learning_rate)\n", + "\n", + "sess.run(tf.global_variables_initializer())" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "tvars = tf.trainable_variables()\n", + "checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "assignment_map, initialized_variable_names = get_assignment_map_from_checkpoint(tvars, \n", + " checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO:tensorflow:Restoring parameters from xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt\n" + ] + } + ], + "source": [ + "saver = tf.train.Saver(var_list = assignment_map)\n", + "saver.restore(sess, checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']\n", + "\n", + "train_Y = data['train_Y']\n", + "test_Y = data['test_Y']\n", + "\n", + "train_Y = [labels.index(i) for i in train_Y]\n", + "test_Y = [labels.index(i) for i in test_Y]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "from tensorflow.keras.preprocessing.sequence import pad_sequences\n", + "\n", + "batch_x = train_ids[:5]\n", + "batch_x = pad_sequences(batch_x,padding='post')\n", + "batch_y = train_Y[:5]\n", + "batch_segments = segment_train[:5]\n", + "batch_segments = pad_sequences(batch_segments, padding='post', value = 4)\n", + "batch_masks = train_masks[:5]\n", + "batch_masks = pad_sequences(batch_masks, padding='post', value = 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0.0, 2.4012828]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sess.run([model.accuracy, model.cost],\n", + " feed_dict = {model.X: batch_x,\n", + " model.Y: batch_y,\n", + " model.segment_ids: batch_segments,\n", + " model.input_masks: batch_masks})" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "train minibatch loop: 100%|██████████| 8182/8182 [46:18<00:00, 2.95it/s, accuracy=1, cost=0.0199] \n", + "test minibatch loop: 100%|██████████| 419/419 [00:51<00:00, 8.13it/s, accuracy=1, cost=0.0653] \n", + "train minibatch loop: 0%| | 0/8182 [00:00 CURRENT_ACC:\n", + " print(\n", + " 'epoch: %d, pass acc: %f, current acc: %f'\n", + " % (EPOCH, CURRENT_ACC, test_acc)\n", + " )\n", + " CURRENT_ACC = test_acc\n", + " CURRENT_CHECKPOINT = 0\n", + " else:\n", + " CURRENT_CHECKPOINT += 1\n", + " \n", + " print('time taken:', time.time() - lasttime)\n", + " print(\n", + " 'epoch: %d, training loss: %f, training acc: %f, valid loss: %f, valid acc: %f\\n'\n", + " % (EPOCH, train_loss, train_acc, test_loss, test_acc)\n", + " )\n", + " EPOCH += 1" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/text-similarity/README.md b/text-similarity/README.md index bc9cb66..7278e43 100644 --- a/text-similarity/README.md +++ b/text-similarity/README.md @@ -1,3 +1,4 @@ ## How-to -1. Run any notebook using Jupyter Notebook. +1. Run [prepare-dataset.ipynb](prepare-dataset.ipynb). +2. Run any notebooks. diff --git a/text-similarity/prepare-dataset.ipynb b/text-similarity/prepare-dataset.ipynb new file mode 100644 index 0000000..3f68599 --- /dev/null +++ b/text-similarity/prepare-dataset.ipynb @@ -0,0 +1,361 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://cims.nyu.edu/~sbowman/multinli/multinli_1.0.zip\n", + "# !unzip -o multinli_1.0.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "from unidecode import unidecode\n", + "\n", + "def cleaning(string):\n", + " return re.sub(r'[ ]+', ' ', unidecode(string)).strip()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['multinli_1.0/multinli_1.0_dev_mismatched.jsonl',\n", + " 'multinli_1.0/multinli_1.0_train.jsonl',\n", + " 'multinli_1.0/multinli_1.0_dev_matched.jsonl']" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from glob import glob\n", + "\n", + "files = glob('multinli_1.0/multinli_1.0_*.jsonl')\n", + "files" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "with open(files[1]) as fopen:\n", + " train = fopen.read().split('\\n')\n", + " \n", + "with open(files[0]) as fopen:\n", + " dev = fopen.read().split('\\n')\n", + " \n", + "with open(files[2]) as fopen:\n", + " dev.extend(fopen.read().split('\\n'))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "labels = ['contradiction', 'entailment']" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 392703/392703 [00:03<00:00, 115787.44it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "train_X, train_Y = [], []\n", + "\n", + "for i in tqdm(range(len(train))):\n", + " try:\n", + " l = json.loads(train[i])\n", + " if l['gold_label'] not in labels:\n", + " continue\n", + " if len(l['sentence1']) and len(l['sentence2']):\n", + " s = f\"{l['sentence1']} <> {l['sentence2']}\"\n", + " train_X.append(s)\n", + " train_Y.append(l['gold_label'])\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 20002/20002 [00:00<00:00, 93673.10it/s]\n" + ] + } + ], + "source": [ + "test_X, test_Y = [], []\n", + "\n", + "for i in tqdm(range(len(dev))):\n", + " try:\n", + " l = json.loads(dev[i])\n", + " if l['gold_label'] not in labels:\n", + " continue\n", + " if len(l['sentence1']) and len(l['sentence2']):\n", + " s = f\"{l['sentence1']} <> {l['sentence2']}\"\n", + " test_X.append(s)\n", + " test_Y.append(l['gold_label'])\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "import youtokentome as yttm\n", + "\n", + "with open('out.txt', 'w') as fopen:\n", + " fopen.write('\\n'.join(test_X + train_X))\n", + " \n", + "yttm.BPE.train(data='out.txt', vocab_size=30000, model='vocab.model')\n", + "bpe = yttm.BPE(model='vocab.model')" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['', '', '', '']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bpe.vocab()[:4]" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['halo halo']" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bpe.decode(bpe.encode('halo') + [2] + bpe.encode('halo'))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [00:09<00:00, 26791.84it/s]\n" + ] + } + ], + "source": [ + "left_train, right_train, label_train = [], [], []\n", + "\n", + "for i in tqdm(range(len(train_X))):\n", + " l, r = train_X[i].split(' <> ')\n", + " left_train.append(bpe.encode(l))\n", + " right_train.append(bpe.encode(r))\n", + " label_train.append(labels.index(train_Y[i]))" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:00<00:00, 29595.87it/s]\n" + ] + } + ], + "source": [ + "left_test, right_test, label_test = [], [], []\n", + "\n", + "for i in tqdm(range(len(test_X))):\n", + " l, r = test_X[i].split(' <> ')\n", + " try:\n", + " label_test.append(labels.index(test_Y[i]))\n", + " left_test.append(bpe.encode(l))\n", + " right_test.append(bpe.encode(r))\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "with open('contrastive.json', 'w') as fopen:\n", + " json.dump({'left_train': left_train,\n", + " 'right_train': right_train,\n", + " 'label_train': label_train,\n", + " 'left_test': left_test,\n", + " 'right_test': right_test,\n", + " 'label_test': label_test}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 261802/261802 [00:09<00:00, 26215.21it/s]\n" + ] + } + ], + "source": [ + "left_train, label_train = [], []\n", + "\n", + "for i in tqdm(range(len(train_X))):\n", + " l, r = train_X[i].split(' <> ')\n", + " left_train.append(bpe.encode(l) + [2] + bpe.encode(r))\n", + " label_train.append(labels.index(train_Y[i]))" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 13395/13395 [00:00<00:00, 13604.82it/s]\n" + ] + } + ], + "source": [ + "left_test, label_test = [], []\n", + "\n", + "for i in tqdm(range(len(test_X))):\n", + " try:\n", + " l, r = test_X[i].split(' <> ')\n", + " label_test.append(labels.index(test_Y[i]))\n", + " left_test.append(bpe.encode(l) + [2] + bpe.encode(r))\n", + " except:\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "with open('pair.json', 'w') as fopen:\n", + " json.dump({'left_train': left_train,\n", + " 'label_train': label_train,\n", + " 'left_test': left_test,\n", + " 'label_test': label_test}, fopen)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "with open('text.json', 'w') as fopen:\n", + " json.dump({'train_X': train_X,\n", + " 'train_Y': train_Y,\n", + " 'test_X': test_X,\n", + " 'test_Y': test_Y}, fopen)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tf-nlp.png b/tf-nlp.png deleted file mode 100644 index 85c7da5..0000000 Binary files a/tf-nlp.png and /dev/null differ diff --git a/embedded/5.lda2vec.ipynb b/topic-model/1.lda2vec.ipynb similarity index 100% rename from embedded/5.lda2vec.ipynb rename to topic-model/1.lda2vec.ipynb diff --git a/topic-model/2.bert-topic.ipynb b/topic-model/2.bert-topic.ipynb new file mode 100644 index 0000000..85ff9f3 --- /dev/null +++ b/topic-model/2.bert-topic.ipynb @@ -0,0 +1,897 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !pip3 install bert-tensorflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download BERT-Base model" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download simple dataset\n", + "\n", + "I want to use negative sentiment corpus to build unsupervised topic models using Attention from BERT." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5330" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data/negative/negative\n", + "\n", + "with open('negative') as fopen:\n", + " negative = fopen.read().split('\\n')[:-1]\n", + "len(negative)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "import itertools" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_ngram(seq, ngram = (1, 3)):\n", + " g = []\n", + " for i in range(ngram[0], ngram[-1] + 1):\n", + " g.extend(list(ngrams_generator(seq, i)))\n", + " return g\n", + "\n", + "def _pad_sequence(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " sequence = iter(sequence)\n", + " if pad_left:\n", + " sequence = itertools.chain((left_pad_symbol,) * (n - 1), sequence)\n", + " if pad_right:\n", + " sequence = itertools.chain(sequence, (right_pad_symbol,) * (n - 1))\n", + " return sequence\n", + "\n", + "\n", + "def ngrams_generator(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " \"\"\"\n", + " generate ngrams.\n", + "\n", + " Parameters\n", + " ----------\n", + " sequence : list of str\n", + " list of tokenize words.\n", + " n : int\n", + " ngram size\n", + "\n", + " Returns\n", + " -------\n", + " ngram: list\n", + " \"\"\"\n", + " sequence = _pad_sequence(\n", + " sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol\n", + " )\n", + "\n", + " history = []\n", + " while n > 1:\n", + " try:\n", + " next_item = next(sequence)\n", + " except StopIteration:\n", + " return\n", + " history.append(next_item)\n", + " n -= 1\n", + " for item in sequence:\n", + " history.append(item)\n", + " yield tuple(history)\n", + " del history[0]\n", + "\n", + "def merge_wordpiece_tokens(paired_tokens, weighted = True):\n", + " new_paired_tokens = []\n", + " n_tokens = len(paired_tokens)\n", + "\n", + " i = 0\n", + "\n", + " while i < n_tokens:\n", + " current_token, current_weight = paired_tokens[i]\n", + " if current_token.startswith('##'):\n", + " previous_token, previous_weight = new_paired_tokens.pop()\n", + " merged_token = previous_token\n", + " merged_weight = [previous_weight]\n", + " while current_token.startswith('##'):\n", + " merged_token = merged_token + current_token.replace('##', '')\n", + " merged_weight.append(current_weight)\n", + " i = i + 1\n", + " current_token, current_weight = paired_tokens[i]\n", + " merged_weight = np.mean(merged_weight)\n", + " new_paired_tokens.append((merged_token, merged_weight))\n", + "\n", + " else:\n", + " new_paired_tokens.append((current_token, current_weight))\n", + " i = i + 1\n", + "\n", + " words = [\n", + " i[0]\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n", + " ]\n", + " weights = [\n", + " i[1]\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n", + " ]\n", + " if weighted:\n", + " weights = np.array(weights)\n", + " weights = weights / np.sum(weights)\n", + " return list(zip(words, weights))\n", + "\n", + "def _extract_attention_weights(num_layers, tf_graph):\n", + " attns = [\n", + " {\n", + " 'layer_%s'\n", + " % i: tf_graph.get_tensor_by_name(\n", + " 'bert/encoder/layer_%s/attention/self/Softmax:0' % i\n", + " )\n", + " }\n", + " for i in range(num_layers)\n", + " ]\n", + "\n", + " return attns\n", + "\n", + "def padding_sequence(seq, maxlen, padding = 'post', pad_int = 0):\n", + " padded_seqs = []\n", + " for s in seq:\n", + " if padding == 'post':\n", + " padded_seqs.append(s + [pad_int] * (maxlen - len(s)))\n", + " if padding == 'pre':\n", + " padded_seqs.append([pad_int] * (maxlen - len(s)) + s)\n", + " return padded_seqs\n", + "\n", + "\n", + "def bert_tokenization(tokenizer, texts, cls = '[CLS]', sep = '[SEP]'):\n", + "\n", + " input_ids, input_masks, segment_ids, s_tokens = [], [], [], []\n", + " for text in texts:\n", + " tokens_a = tokenizer.tokenize(text)\n", + " tokens = [cls] + tokens_a + [sep]\n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " s_tokens.append(tokens)\n", + "\n", + " maxlen = max([len(i) for i in input_ids])\n", + " input_ids = padding_sequence(input_ids, maxlen)\n", + " input_masks = padding_sequence(input_masks, maxlen)\n", + " segment_ids = padding_sequence(segment_ids, maxlen)\n", + "\n", + " return input_ids, input_masks, segment_ids, s_tokens\n", + "\n", + "class _Model:\n", + " def __init__(self, bert_config, tokenizer):\n", + " _graph = tf.Graph()\n", + " with _graph.as_default():\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self._tokenizer = tokenizer\n", + "\n", + " self.model = modeling.BertModel(\n", + " config = bert_config,\n", + " is_training = False,\n", + " input_ids = self.X,\n", + " use_one_hot_embeddings = False,\n", + " )\n", + " self.logits = self.model.get_pooled_output()\n", + " self._sess = tf.InteractiveSession()\n", + " self._sess.run(tf.global_variables_initializer())\n", + " var_lists = tf.get_collection(\n", + " tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert'\n", + " )\n", + " self._saver = tf.train.Saver(var_list = var_lists)\n", + " attns = _extract_attention_weights(\n", + " bert_config.num_hidden_layers, tf.get_default_graph()\n", + " )\n", + " self.attns = attns\n", + "\n", + " def vectorize(self, strings):\n", + "\n", + " \"\"\"\n", + " Vectorize string inputs using bert attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + "\n", + " Returns\n", + " -------\n", + " array: vectorized strings\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " batch_x, _, _, _ = bert_tokenization(self._tokenizer, strings)\n", + " return self._sess.run(self.logits, feed_dict = {self.X: batch_x})\n", + "\n", + " def attention(self, strings, method = 'last', **kwargs):\n", + " \"\"\"\n", + " Get attention string inputs from bert attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + " method : str, optional (default='last')\n", + " Attention layer supported. Allowed values:\n", + "\n", + " * ``'last'`` - attention from last layer.\n", + " * ``'first'`` - attention from first layer.\n", + " * ``'mean'`` - average attentions from all layers.\n", + "\n", + " Returns\n", + " -------\n", + " array: attention\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " method = method.lower()\n", + " if method not in ['last', 'first', 'mean']:\n", + " raise Exception(\n", + " \"method not supported, only support 'last', 'first' and 'mean'\"\n", + " )\n", + "\n", + " batch_x, _, _, s_tokens = bert_tokenization(self._tokenizer, strings)\n", + " maxlen = max([len(s) for s in s_tokens])\n", + " s_tokens = padding_sequence(s_tokens, maxlen, pad_int = '[SEP]')\n", + " attentions = self._sess.run(self.attns, feed_dict = {self.X: batch_x})\n", + " if method == 'first':\n", + " cls_attn = list(attentions[0].values())[0][:, :, 0, :]\n", + "\n", + " if method == 'last':\n", + " cls_attn = list(attentions[-1].values())[0][:, :, 0, :]\n", + "\n", + " if method == 'mean':\n", + " combined_attentions = []\n", + " for a in attentions:\n", + " combined_attentions.append(list(a.values())[0])\n", + " cls_attn = np.mean(combined_attentions, axis = 0).mean(axis = 2)\n", + "\n", + " cls_attn = np.mean(cls_attn, axis = 1)\n", + " total_weights = np.sum(cls_attn, axis = -1, keepdims = True)\n", + " attn = cls_attn / total_weights\n", + " output = []\n", + " for i in range(attn.shape[0]):\n", + " output.append(\n", + " merge_wordpiece_tokens(list(zip(s_tokens[i], attn[i])))\n", + " )\n", + " return output" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0820 00:50:25.676800 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n", + "W0820 00:50:25.755635 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "W0820 00:50:25.757595 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "W0820 00:50:25.783736 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "W0820 00:50:26.212700 139771824637760 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0820 00:50:26.247612 139771824637760 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n" + ] + } + ], + "source": [ + "tokenizer = tokenization.FullTokenizer(vocab_file=BERT_VOCAB, do_lower_case=False)\n", + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "model = _Model(bert_config, tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test vectorization" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0.55213624, -0.33787724, 0.74862313, ..., -0.04363263,\n", + " 0.31521446, 0.07524541],\n", + " [ 0.59046894, -0.304328 , 0.7821516 , ..., -0.16189037,\n", + " 0.367751 , 0.07440313]], dtype=float32)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v = model.vectorize(['hello nice to meet u', 'so long sucker'])\n", + "v" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 768)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test attention" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[('hello', 0.19323255),\n", + " ('nice', 0.19877374),\n", + " ('to', 0.19795448),\n", + " ('meet', 0.20197453),\n", + " ('u', 0.20806469)],\n", + " [('so', 0.34224316), ('long', 0.31957355), ('sucker', 0.3381833)]]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.attention(['hello nice to meet u', 'so long sucker'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building topic modeling" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 10\n", + "ngram = (1, 3)\n", + "n_topics = 10" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 533/533 [01:11<00:00, 7.32it/s]\n" + ] + } + ], + "source": [ + "from sklearn.cluster import KMeans\n", + "from tqdm import tqdm\n", + "\n", + "rows, attentions = [], []\n", + "for i in tqdm(range(0, len(negative), batch_size)):\n", + " index = min(i + batch_size, len(negative))\n", + " rows.append(model.vectorize(negative[i:index]))\n", + " attentions.extend(model.attention(negative[i:index]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download simple english stopwords\n", + "\n", + "You might want to gather more of stopwords." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.json" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1298" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "with open('stopwords-en.json') as fopen:\n", + " stopwords = json.load(fopen)\n", + "len(stopwords)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[\"'ll\", \"'tis\", \"'twas\", \"'ve\", '10']" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stopwords[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processed 500\n", + "processed 1000\n", + "processed 1500\n", + "processed 2000\n", + "processed 2500\n", + "processed 3000\n", + "processed 3500\n", + "processed 4000\n", + "processed 4500\n", + "processed 5000\n" + ] + } + ], + "source": [ + "concat = np.concatenate(rows, axis = 0)\n", + "kmeans = KMeans(n_clusters = n_topics, random_state = 0).fit(concat)\n", + "labels = kmeans.labels_\n", + "\n", + "overall, filtered_a = [], []\n", + "for a in attentions:\n", + " f = [i for i in a if i[0] not in stopwords]\n", + " overall.extend(f)\n", + " filtered_a.append(f)\n", + "\n", + "o_ngram = generate_ngram(overall, ngram)\n", + "features = []\n", + "for i in o_ngram:\n", + " features.append(' '.join([w[0] for w in i]))\n", + "features = list(set(features))\n", + "\n", + "components = np.zeros((n_topics, len(features)))\n", + "for no, i in enumerate(labels):\n", + " if (no + 1) % 500 == 0:\n", + " print('processed %d'%(no + 1))\n", + " f = generate_ngram(filtered_a[no], ngram)\n", + " for w in f:\n", + " word = ' '.join([r[0] for r in w])\n", + " score = np.mean([r[1] for r in w])\n", + " if word in features:\n", + " components[i, features.index(word)] += score" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "def print_topics_modelling(\n", + " topics, feature_names, sorting, n_words = 20, return_df = True\n", + "):\n", + " if return_df:\n", + " try:\n", + " import pandas as pd\n", + " except:\n", + " raise Exception(\n", + " 'pandas not installed. Please install it and try again or set `return_df = False`'\n", + " )\n", + " df = {}\n", + " for i in range(topics):\n", + " words = []\n", + " for k in range(n_words):\n", + " words.append(feature_names[sorting[i, k]])\n", + " df['topic %d' % (i)] = words\n", + " if return_df:\n", + " return pd.DataFrame.from_dict(df)\n", + " else:\n", + " return df" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
topic 0topic 1topic 2topic 3topic 4topic 5topic 6topic 7topic 8topic 9
0moviemoviemoviemoviemoviemoviemoviemoviemoviemovie
1filmfilmfilmfilmfilmfilmfilmfilmfilmfilm
2comedycharactersbadplotstorybadcharacterscharactersstorystory
3badstorystorybadtimedulltimestorycomedyfilms
4lametimefilmscomedydirectorstorystoryfeelsbadcharacters
5dullfilmscomedymoviesmoviesactioncomedycomedyboringtime
6sillybadtimestorybadcomedyactionlovetalecomedy
7messminutescharacterstimecomedythrillerscriptscriptdullbad
8pretentiousactionmoviescharacterscharacterscharactersfilmscharacterpredictablescript
9stupidplotplothardreasonfeelsdirectoractionmoviesaction
\n", + "
" + ], + "text/plain": [ + " topic 0 topic 1 topic 2 topic 3 topic 4 topic 5 \\\n", + "0 movie movie movie movie movie movie \n", + "1 film film film film film film \n", + "2 comedy characters bad plot story bad \n", + "3 bad story story bad time dull \n", + "4 lame time films comedy director story \n", + "5 dull films comedy movies movies action \n", + "6 silly bad time story bad comedy \n", + "7 mess minutes characters time comedy thriller \n", + "8 pretentious action movies characters characters characters \n", + "9 stupid plot plot hard reason feels \n", + "\n", + " topic 6 topic 7 topic 8 topic 9 \n", + "0 movie movie movie movie \n", + "1 film film film film \n", + "2 characters characters story story \n", + "3 time story comedy films \n", + "4 story feels bad characters \n", + "5 comedy comedy boring time \n", + "6 action love tale comedy \n", + "7 script script dull bad \n", + "8 films character predictable script \n", + "9 director action movies action " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print_topics_modelling(\n", + " 10,\n", + " feature_names = np.array(features),\n", + " sorting = np.argsort(components)[:, ::-1],\n", + " n_words = 10,\n", + " return_df = True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/topic-model/3.xlnet-topic.ipynb b/topic-model/3.xlnet-topic.ipynb new file mode 100644 index 0000000..1635293 --- /dev/null +++ b/topic-model/3.xlnet-topic.ipynb @@ -0,0 +1,1012 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download XLNET-Base model" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/xlnet/released_models/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "spiece.model\t xlnet_model.ckpt.data-00000-of-00001 xlnet_model.ckpt.meta\r\n", + "xlnet_config.json xlnet_model.ckpt.index\r\n" + ] + } + ], + "source": [ + "!ls xlnet_cased_L-12_H-768_A-12" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download simple dataset\n", + "\n", + "I want to use negative sentiment corpus to build unsupervised topic models using Attention from XLNET." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5330" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data/negative/negative\n", + "\n", + "with open('negative') as fopen:\n", + " negative = fopen.read().split('\\n')[:-1]\n", + "len(negative)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "import collections\n", + "import re\n", + "import numpy as np\n", + "import itertools\n", + "from prepro_utils import preprocess_text, encode_ids, encode_pieces\n", + "\n", + "SEG_ID_A = 0\n", + "SEG_ID_B = 1\n", + "SEG_ID_CLS = 2\n", + "SEG_ID_SEP = 3\n", + "SEG_ID_PAD = 4\n", + "\n", + "special_symbols = {\n", + " '': 0,\n", + " '': 1,\n", + " '': 2,\n", + " '': 3,\n", + " '': 4,\n", + " '': 5,\n", + " '': 6,\n", + " '': 7,\n", + " '': 8,\n", + "}\n", + "\n", + "UNK_ID = special_symbols['']\n", + "CLS_ID = special_symbols['']\n", + "SEP_ID = special_symbols['']\n", + "MASK_ID = special_symbols['']\n", + "EOD_ID = special_symbols['']\n", + "\n", + "def generate_ngram(seq, ngram = (1, 3)):\n", + " g = []\n", + " for i in range(ngram[0], ngram[-1] + 1):\n", + " g.extend(list(ngrams_generator(seq, i)))\n", + " return g\n", + "\n", + "def _pad_sequence(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " sequence = iter(sequence)\n", + " if pad_left:\n", + " sequence = itertools.chain((left_pad_symbol,) * (n - 1), sequence)\n", + " if pad_right:\n", + " sequence = itertools.chain(sequence, (right_pad_symbol,) * (n - 1))\n", + " return sequence\n", + "\n", + "\n", + "def ngrams_generator(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " \"\"\"\n", + " generate ngrams.\n", + "\n", + " Parameters\n", + " ----------\n", + " sequence : list of str\n", + " list of tokenize words.\n", + " n : int\n", + " ngram size\n", + "\n", + " Returns\n", + " -------\n", + " ngram: list\n", + " \"\"\"\n", + " sequence = _pad_sequence(\n", + " sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol\n", + " )\n", + "\n", + " history = []\n", + " while n > 1:\n", + " try:\n", + " next_item = next(sequence)\n", + " except StopIteration:\n", + " return\n", + " history.append(next_item)\n", + " n -= 1\n", + " for item in sequence:\n", + " history.append(item)\n", + " yield tuple(history)\n", + " del history[0]\n", + "\n", + "def tokenize_fn(text, sp_model):\n", + " text = preprocess_text(text, lower = False)\n", + " return encode_ids(sp_model, text) \n", + " \n", + "def merge_sentencepiece_tokens(paired_tokens, weighted = True):\n", + " new_paired_tokens = []\n", + " n_tokens = len(paired_tokens)\n", + " rejected = ['', '']\n", + "\n", + " i = 0\n", + "\n", + " while i < n_tokens:\n", + "\n", + " current_token, current_weight = paired_tokens[i]\n", + " if not current_token.startswith('▁') and current_token not in rejected:\n", + " previous_token, previous_weight = new_paired_tokens.pop()\n", + " merged_token = previous_token\n", + " merged_weight = [previous_weight]\n", + " while (\n", + " not current_token.startswith('▁')\n", + " and current_token not in rejected\n", + " ):\n", + " merged_token = merged_token + current_token.replace('▁', '')\n", + " merged_weight.append(current_weight)\n", + " i = i + 1\n", + " current_token, current_weight = paired_tokens[i]\n", + " merged_weight = np.mean(merged_weight)\n", + " new_paired_tokens.append((merged_token, merged_weight))\n", + "\n", + " else:\n", + " new_paired_tokens.append((current_token, current_weight))\n", + " i = i + 1\n", + "\n", + " words = [\n", + " i[0].replace('▁', '')\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['', '', '']\n", + " ]\n", + " weights = [\n", + " i[1]\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['', '', '']\n", + " ]\n", + " if weighted:\n", + " weights = np.array(weights)\n", + " weights = weights / np.sum(weights)\n", + " return list(zip(words, weights))\n", + "\n", + "def xlnet_tokenization(tokenizer, texts):\n", + " input_ids, input_masks, segment_ids, s_tokens = [], [], [], []\n", + " for text in texts:\n", + " tokens_a = tokenize_fn(text, tokenizer)\n", + " tokens = []\n", + " segment_id = []\n", + " for token in tokens_a:\n", + " tokens.append(token)\n", + " segment_id.append(SEG_ID_A)\n", + "\n", + " tokens.append(SEP_ID)\n", + " segment_id.append(SEG_ID_A)\n", + " tokens.append(CLS_ID)\n", + " segment_id.append(SEG_ID_CLS)\n", + "\n", + " input_id = tokens\n", + " input_mask = [0] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " s_tokens.append([tokenizer.IdToPiece(i) for i in tokens])\n", + "\n", + " maxlen = max([len(i) for i in input_ids])\n", + " input_ids = padding_sequence(input_ids, maxlen, padding = 'pre')\n", + " input_masks = padding_sequence(\n", + " input_masks, maxlen, padding = 'pre', pad_int = 1\n", + " )\n", + " segment_ids = padding_sequence(\n", + " segment_ids, maxlen, padding = 'pre', pad_int = SEG_ID_PAD\n", + " )\n", + "\n", + " return input_ids, input_masks, segment_ids, s_tokens\n", + "\n", + "def padding_sequence(seq, maxlen, padding = 'post', pad_int = 0):\n", + " padded_seqs = []\n", + " for s in seq:\n", + " if padding == 'post':\n", + " padded_seqs.append(s + [pad_int] * (maxlen - len(s)))\n", + " if padding == 'pre':\n", + " padded_seqs.append([pad_int] * (maxlen - len(s)) + s)\n", + " return padded_seqs\n", + "\n", + "def get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n", + " assignment_map = {}\n", + " initialized_variable_names = {}\n", + "\n", + " name_to_variable = collections.OrderedDict()\n", + " for var in tvars:\n", + " name = var.name\n", + " m = re.match('^(.*):\\\\d+$', name)\n", + " if m is not None:\n", + " name = m.group(1)\n", + " name_to_variable[name] = var\n", + "\n", + " init_vars = tf.train.list_variables(init_checkpoint)\n", + "\n", + " assignment_map = collections.OrderedDict()\n", + " for x in init_vars:\n", + " (name, var) = (x[0], x[1])\n", + " if name not in name_to_variable:\n", + " continue\n", + " assignment_map[name] = name_to_variable[name]\n", + " initialized_variable_names[name] = 1\n", + " initialized_variable_names[name + ':0'] = 1\n", + "\n", + " return (assignment_map, initialized_variable_names)\n", + "\n", + "\n", + "class _Model:\n", + " def __init__(self, xlnet_config, tokenizer, checkpoint, pool_mode = 'last'):\n", + "\n", + " kwargs = dict(\n", + " is_training = True,\n", + " use_tpu = False,\n", + " use_bfloat16 = False,\n", + " dropout = 0.0,\n", + " dropatt = 0.0,\n", + " init = 'normal',\n", + " init_range = 0.1,\n", + " init_std = 0.05,\n", + " clamp_len = -1,\n", + " )\n", + "\n", + " xlnet_parameters = xlnet_lib.RunConfig(**kwargs)\n", + "\n", + " self._tokenizer = tokenizer\n", + " _graph = tf.Graph()\n", + " with _graph.as_default():\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self.segment_ids = tf.placeholder(tf.int32, [None, None])\n", + " self.input_masks = tf.placeholder(tf.float32, [None, None])\n", + "\n", + " xlnet_model = xlnet_lib.XLNetModel(\n", + " xlnet_config = xlnet_config,\n", + " run_config = xlnet_parameters,\n", + " input_ids = tf.transpose(self.X, [1, 0]),\n", + " seg_ids = tf.transpose(self.segment_ids, [1, 0]),\n", + " input_mask = tf.transpose(self.input_masks, [1, 0]),\n", + " )\n", + "\n", + " self.logits = xlnet_model.get_pooled_out(pool_mode, True)\n", + " self._sess = tf.InteractiveSession()\n", + " self._sess.run(tf.global_variables_initializer())\n", + " tvars = tf.trainable_variables()\n", + " assignment_map, _ = get_assignment_map_from_checkpoint(\n", + " tvars, checkpoint\n", + " )\n", + " self._saver = tf.train.Saver(var_list = assignment_map)\n", + " attentions = [\n", + " n.name\n", + " for n in tf.get_default_graph().as_graph_def().node\n", + " if 'rel_attn/Softmax' in n.name\n", + " ]\n", + " g = tf.get_default_graph()\n", + " self.attention_nodes = [\n", + " g.get_tensor_by_name('%s:0' % (a)) for a in attentions\n", + " ]\n", + "\n", + " def vectorize(self, strings):\n", + " \"\"\"\n", + " Vectorize string inputs using bert attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + "\n", + " Returns\n", + " -------\n", + " array: vectorized strings\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " input_ids, input_masks, segment_ids, _ = xlnet_tokenization(\n", + " self._tokenizer, strings\n", + " )\n", + " return self._sess.run(\n", + " self.logits,\n", + " feed_dict = {\n", + " self.X: input_ids,\n", + " self.segment_ids: segment_ids,\n", + " self.input_masks: input_masks,\n", + " },\n", + " )\n", + "\n", + " def attention(self, strings, method = 'last', **kwargs):\n", + " \"\"\"\n", + " Get attention string inputs from xlnet attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + " method : str, optional (default='last')\n", + " Attention layer supported. Allowed values:\n", + "\n", + " * ``'last'`` - attention from last layer.\n", + " * ``'first'`` - attention from first layer.\n", + " * ``'mean'`` - average attentions from all layers.\n", + "\n", + " Returns\n", + " -------\n", + " array: attention\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " method = method.lower()\n", + " if method not in ['last', 'first', 'mean']:\n", + " raise Exception(\n", + " \"method not supported, only support ['last', 'first', 'mean']\"\n", + " )\n", + "\n", + " input_ids, input_masks, segment_ids, s_tokens = xlnet_tokenization(\n", + " self._tokenizer, strings\n", + " )\n", + " maxlen = max([len(s) for s in s_tokens])\n", + " s_tokens = padding_sequence(s_tokens, maxlen, pad_int = '')\n", + " attentions = self._sess.run(\n", + " self.attention_nodes,\n", + " feed_dict = {\n", + " self.X: input_ids,\n", + " self.segment_ids: segment_ids,\n", + " self.input_masks: input_masks,\n", + " },\n", + " )\n", + "\n", + " if method == 'first':\n", + " cls_attn = np.transpose(attentions[0][:, 0], (1, 0, 2))\n", + "\n", + " if method == 'last':\n", + " cls_attn = np.transpose(attentions[-1][:, 0], (1, 0, 2))\n", + "\n", + " if method == 'mean':\n", + " cls_attn = np.transpose(\n", + " np.mean(attentions, axis = 0).mean(axis = 1), (1, 0, 2)\n", + " )\n", + "\n", + " cls_attn = np.mean(cls_attn, axis = 1)\n", + " total_weights = np.sum(cls_attn, axis = -1, keepdims = True)\n", + " attn = cls_attn / total_weights\n", + " output = []\n", + " for i in range(attn.shape[0]):\n", + " output.append(\n", + " merge_sentencepiece_tokens(list(zip(s_tokens[i], attn[i])))\n", + " )\n", + " return output" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint8 = np.dtype([(\"qint8\", np.int8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint8 = np.dtype([(\"quint8\", np.uint8, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint16 = np.dtype([(\"qint16\", np.int16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_quint16 = np.dtype([(\"quint16\", np.uint16, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " _np_qint32 = np.dtype([(\"qint32\", np.int32, 1)])\n", + "/home/husein/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.\n", + " np_resource = np.dtype([(\"resource\", np.ubyte, 1)])\n", + "WARNING: Logging before flag parsing goes to stderr.\n", + "W0831 03:24:51.882980 139627729786688 deprecation_wrapper.py:119] From /home/husein/local/xlnet.py:70: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n", + "W0831 03:24:51.902457 139627729786688 deprecation_wrapper.py:119] From /home/husein/local/xlnet.py:253: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "W0831 03:24:51.903432 139627729786688 deprecation_wrapper.py:119] From /home/husein/local/xlnet.py:253: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.\n", + "\n", + "W0831 03:24:51.905369 139627729786688 deprecation_wrapper.py:119] From /home/husein/local/modeling.py:686: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.\n", + "\n", + "W0831 03:24:51.906327 139627729786688 deprecation_wrapper.py:119] From /home/husein/local/modeling.py:693: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "W0831 03:24:51.970893 139627729786688 deprecation.py:323] From /home/husein/local/modeling.py:797: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dropout instead.\n", + "W0831 03:24:53.032129 139627729786688 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0831 03:24:53.051974 139627729786688 deprecation.py:323] From /home/husein/local/modeling.py:99: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n", + "W0831 03:25:01.478370 139627729786688 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use standard file APIs to check for files with this prefix.\n" + ] + } + ], + "source": [ + "import sentencepiece as spm\n", + "import xlnet as xlnet_lib\n", + "import tensorflow as tf\n", + "\n", + "sp_model = spm.SentencePieceProcessor()\n", + "sp_model.Load('xlnet_cased_L-12_H-768_A-12/spiece.model')\n", + "xlnet_config = xlnet_lib.XLNetConfig(\n", + " json_path = 'xlnet_cased_L-12_H-768_A-12/xlnet_config.json'\n", + ")\n", + "xlnet_checkpoint = 'xlnet_cased_L-12_H-768_A-12/xlnet_model.ckpt'\n", + "model = _Model(\n", + " xlnet_config, sp_model, xlnet_checkpoint, pool_mode = 'last'\n", + ")\n", + "model._saver.restore(model._sess, xlnet_checkpoint)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[-0.9849674 , -0.67051286, 0.99992144, ..., -0.99999577,\n", + " -0.99904066, -0.85338414],\n", + " [-0.64562905, -0.7124205 , 0.99995923, ..., -0.9999986 ,\n", + " -0.9997058 , -0.9994817 ]], dtype=float32)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v = model.vectorize(['hello nice to meet u', 'so long sucker'])\n", + "v" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 768)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[('hello', 0.6345004),\n", + " ('nice', 0.22861008),\n", + " ('to', 0.04936926),\n", + " ('meet', 0.022651237),\n", + " ('u', 0.064868994)],\n", + " [('so', 0.117449395), ('long', 0.13799533), ('sucker', 0.74455523)]]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.attention(['hello nice to meet u', 'so long sucker'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building topic modeling" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 10\n", + "ngram = (1, 3)\n", + "n_topics = 10" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 533/533 [00:41<00:00, 12.94it/s]\n" + ] + } + ], + "source": [ + "from sklearn.cluster import KMeans\n", + "from tqdm import tqdm\n", + "\n", + "rows, attentions = [], []\n", + "for i in tqdm(range(0, len(negative), batch_size)):\n", + " index = min(i + batch_size, len(negative))\n", + " rows.append(model.vectorize(negative[i:index]))\n", + " attentions.extend(model.attention(negative[i:index]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download simple english stopwords\n", + "\n", + "You might want to gather more of stopwords." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2019-08-31 03:25:45-- https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.json\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.192.133, 151.101.128.133, 151.101.64.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.192.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10275 (10K) [text/plain]\n", + "Saving to: ‘stopwords-en.json’\n", + "\n", + "stopwords-en.json 100%[===================>] 10.03K --.-KB/s in 0s \n", + "\n", + "2019-08-31 03:25:46 (52.9 MB/s) - ‘stopwords-en.json’ saved [10275/10275]\n", + "\n" + ] + } + ], + "source": [ + "!wget https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.json" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1298" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "with open('stopwords-en.json') as fopen:\n", + " stopwords = json.load(fopen)\n", + "len(stopwords)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processed 500\n", + "processed 1000\n", + "processed 1500\n", + "processed 2000\n", + "processed 2500\n", + "processed 3000\n", + "processed 3500\n", + "processed 4000\n", + "processed 4500\n", + "processed 5000\n" + ] + } + ], + "source": [ + "concat = np.concatenate(rows, axis = 0)\n", + "kmeans = KMeans(n_clusters = n_topics, random_state = 0).fit(concat)\n", + "labels = kmeans.labels_\n", + "\n", + "overall, filtered_a = [], []\n", + "for a in attentions:\n", + " f = [i for i in a if i[0] not in stopwords]\n", + " overall.extend(f)\n", + " filtered_a.append(f)\n", + "\n", + "o_ngram = generate_ngram(overall, ngram)\n", + "features = []\n", + "for i in o_ngram:\n", + " features.append(' '.join([w[0] for w in i]))\n", + "features = list(set(features))\n", + "\n", + "components = np.zeros((n_topics, len(features)))\n", + "for no, i in enumerate(labels):\n", + " if (no + 1) % 500 == 0:\n", + " print('processed %d'%(no + 1))\n", + " f = generate_ngram(filtered_a[no], ngram)\n", + " for w in f:\n", + " word = ' '.join([r[0] for r in w])\n", + " score = np.mean([r[1] for r in w])\n", + " if word in features:\n", + " components[i, features.index(word)] += score" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def print_topics_modelling(\n", + " topics, feature_names, sorting, n_words = 20, return_df = True\n", + "):\n", + " if return_df:\n", + " try:\n", + " import pandas as pd\n", + " except:\n", + " raise Exception(\n", + " 'pandas not installed. Please install it and try again or set `return_df = False`'\n", + " )\n", + " df = {}\n", + " for i in range(topics):\n", + " words = []\n", + " for k in range(n_words):\n", + " words.append(feature_names[sorting[i, k]])\n", + " df['topic %d' % (i)] = words\n", + " if return_df:\n", + " return pd.DataFrame.from_dict(df)\n", + " else:\n", + " return df" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
topic 0topic 1topic 2topic 3topic 4topic 5topic 6topic 7topic 8topic 9
0moviemoviefilmfilmmoviemoviemoviefilmmusicmovie
1filmfilmmoviemoviefilmfilmfilmmovietwistsfilm
2comedybadtimecharacterscharacterslifegloryactionfilmcomedy
3badpretentiousdramacomedymoviesboringstorybadrespectablebad
4horriblecomedyfilmsintentionsstorybadtensionbikepacedclue
5lovesillybadfilmscomedyminutesbmoviedrrohypnolstory
6junkdullstorystoryfeelswastedistressinglysubjecttiresomevideo
7timeboringsettimeaudienceperformanceshollywoodfilmsteengangplot
8effortmessreganbibbidybobbidiblandbadstoryguiltyhardimposterscript
9exerciseplainminutesboringblackthrillermaterialmotherthrillerscenes
\n", + "
" + ], + "text/plain": [ + " topic 0 topic 1 topic 2 topic 3 topic 4 \\\n", + "0 movie movie film film movie \n", + "1 film film movie movie film \n", + "2 comedy bad time characters characters \n", + "3 bad pretentious drama comedy movies \n", + "4 horrible comedy films intentions story \n", + "5 love silly bad films comedy \n", + "6 junk dull story story feels \n", + "7 time boring set time audience \n", + "8 effort mess regan bibbidybobbidibland bad \n", + "9 exercise plain minutes boring black \n", + "\n", + " topic 5 topic 6 topic 7 topic 8 topic 9 \n", + "0 movie movie film music movie \n", + "1 film film movie twists film \n", + "2 life glory action film comedy \n", + "3 boring story bad respectable bad \n", + "4 bad tension bike paced clue \n", + "5 minutes bmovie dr rohypnol story \n", + "6 waste distressingly subject tiresome video \n", + "7 performances hollywood films teengang plot \n", + "8 story guilty hard imposter script \n", + "9 thriller material mother thriller scenes " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print_topics_modelling(\n", + " 10,\n", + " feature_names = np.array(features),\n", + " sorting = np.argsort(components)[:, ::-1],\n", + " n_words = 10,\n", + " return_df = True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/topic-model/modeling.py b/topic-model/modeling.py new file mode 100644 index 0000000..b0fc2b4 --- /dev/null +++ b/topic-model/modeling.py @@ -0,0 +1,1127 @@ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import numpy as np +import tensorflow as tf + + +def gelu(x): + """Gaussian Error Linear Unit. + + This is a smoother version of the RELU. + Original paper: https://arxiv.org/abs/1606.08415 + Args: + x: float Tensor to perform activation. + + Returns: + `x` with the GELU activation applied. + """ + cdf = 0.5 * ( + 1.0 + tf.tanh((np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))) + ) + return x * cdf + + +def embedding_lookup( + x, + n_token, + d_embed, + initializer, + use_tpu = True, + scope = 'embedding', + reuse = None, + dtype = tf.float32, +): + """TPU and GPU embedding_lookup function.""" + with tf.variable_scope(scope, reuse = reuse): + lookup_table = tf.get_variable( + 'lookup_table', + [n_token, d_embed], + dtype = dtype, + initializer = initializer, + ) + if use_tpu: + one_hot_idx = tf.one_hot(x, n_token, dtype = dtype) + if one_hot_idx.shape.ndims == 2: + return ( + tf.einsum('in,nd->id', one_hot_idx, lookup_table), + lookup_table, + ) + else: + return ( + tf.einsum('ibn,nd->ibd', one_hot_idx, lookup_table), + lookup_table, + ) + else: + return tf.nn.embedding_lookup(lookup_table, x), lookup_table + + +def positional_embedding(pos_seq, inv_freq, bsz = None): + sinusoid_inp = tf.einsum('i,d->id', pos_seq, inv_freq) + pos_emb = tf.concat([tf.sin(sinusoid_inp), tf.cos(sinusoid_inp)], -1) + pos_emb = pos_emb[:, None, :] + + if bsz is not None: + pos_emb = tf.tile(pos_emb, [1, bsz, 1]) + + return pos_emb + + +def positionwise_ffn( + inp, + d_model, + d_inner, + dropout, + kernel_initializer, + activation_type = 'relu', + scope = 'ff', + is_training = True, + reuse = None, +): + """Position-wise Feed-forward Network.""" + if activation_type == 'relu': + activation = tf.nn.relu + elif activation_type == 'gelu': + activation = gelu + else: + raise ValueError( + 'Unsupported activation type {}'.format(activation_type) + ) + + output = inp + with tf.variable_scope(scope, reuse = reuse): + output = tf.layers.dense( + output, + d_inner, + activation = activation, + kernel_initializer = kernel_initializer, + name = 'layer_1', + ) + output = tf.layers.dropout( + output, dropout, training = is_training, name = 'drop_1' + ) + output = tf.layers.dense( + output, + d_model, + kernel_initializer = kernel_initializer, + name = 'layer_2', + ) + output = tf.layers.dropout( + output, dropout, training = is_training, name = 'drop_2' + ) + output = tf.contrib.layers.layer_norm( + output + inp, begin_norm_axis = -1, scope = 'LayerNorm' + ) + return output + + +def head_projection(h, d_model, n_head, d_head, kernel_initializer, name): + """Project hidden states to a specific head with a 4D-shape.""" + proj_weight = tf.get_variable( + '{}/kernel'.format(name), + [d_model, n_head, d_head], + dtype = h.dtype, + initializer = kernel_initializer, + ) + head = tf.einsum('ibh,hnd->ibnd', h, proj_weight) + + return head + + +def post_attention( + h, + attn_vec, + d_model, + n_head, + d_head, + dropout, + is_training, + kernel_initializer, + residual = True, +): + """Post-attention processing.""" + # post-attention projection (back to `d_model`) + proj_o = tf.get_variable( + 'o/kernel', + [d_model, n_head, d_head], + dtype = h.dtype, + initializer = kernel_initializer, + ) + attn_out = tf.einsum('ibnd,hnd->ibh', attn_vec, proj_o) + + attn_out = tf.layers.dropout(attn_out, dropout, training = is_training) + if residual: + output = tf.contrib.layers.layer_norm( + attn_out + h, begin_norm_axis = -1, scope = 'LayerNorm' + ) + else: + output = tf.contrib.layers.layer_norm( + attn_out, begin_norm_axis = -1, scope = 'LayerNorm' + ) + + return output + + +def abs_attn_core( + q_head, k_head, v_head, attn_mask, dropatt, is_training, scale +): + """Core absolute positional attention operations.""" + + attn_score = tf.einsum('ibnd,jbnd->ijbn', q_head, k_head) + attn_score *= scale + if attn_mask is not None: + attn_score = attn_score - 1e30 * attn_mask + + # attention probability + attn_prob = tf.nn.softmax(attn_score, 1) + attn_prob = tf.layers.dropout(attn_prob, dropatt, training = is_training) + + # attention output + attn_vec = tf.einsum('ijbn,jbnd->ibnd', attn_prob, v_head) + + return attn_vec + + +def rel_attn_core( + q_head, + k_head_h, + v_head_h, + k_head_r, + seg_embed, + seg_mat, + r_w_bias, + r_r_bias, + r_s_bias, + attn_mask, + dropatt, + is_training, + scale, +): + """Core relative positional attention operations.""" + + # content based attention score + ac = tf.einsum('ibnd,jbnd->ijbn', q_head + r_w_bias, k_head_h) + + # position based attention score + bd = tf.einsum('ibnd,jbnd->ijbn', q_head + r_r_bias, k_head_r) + bd = rel_shift(bd, klen = tf.shape(ac)[1]) + + # segment based attention score + if seg_mat is None: + ef = 0 + else: + ef = tf.einsum('ibnd,snd->ibns', q_head + r_s_bias, seg_embed) + ef = tf.einsum('ijbs,ibns->ijbn', seg_mat, ef) + + # merge attention scores and perform masking + attn_score = (ac + bd + ef) * scale + if attn_mask is not None: + # attn_score = attn_score * (1 - attn_mask) - 1e30 * attn_mask + attn_score = attn_score - 1e30 * attn_mask + + # attention probability + attn_prob = tf.nn.softmax(attn_score, 1) + attn_prob = tf.layers.dropout(attn_prob, dropatt, training = is_training) + + # attention output + attn_vec = tf.einsum('ijbn,jbnd->ibnd', attn_prob, v_head_h) + + return attn_vec + + +def rel_shift(x, klen = -1): + """perform relative shift to form the relative attention score.""" + x_size = tf.shape(x) + + x = tf.reshape(x, [x_size[1], x_size[0], x_size[2], x_size[3]]) + x = tf.slice(x, [1, 0, 0, 0], [-1, -1, -1, -1]) + x = tf.reshape(x, [x_size[0], x_size[1] - 1, x_size[2], x_size[3]]) + x = tf.slice(x, [0, 0, 0, 0], [-1, klen, -1, -1]) + + return x + + +def _create_mask(qlen, mlen, dtype = tf.float32, same_length = False): + """create causal attention mask.""" + attn_mask = tf.ones([qlen, qlen], dtype = dtype) + mask_u = tf.matrix_band_part(attn_mask, 0, -1) + mask_dia = tf.matrix_band_part(attn_mask, 0, 0) + attn_mask_pad = tf.zeros([qlen, mlen], dtype = dtype) + ret = tf.concat([attn_mask_pad, mask_u - mask_dia], 1) + if same_length: + mask_l = tf.matrix_band_part(attn_mask, -1, 0) + ret = tf.concat([ret[:, :qlen] + mask_l - mask_dia, ret[:, qlen:]], 1) + + return ret + + +def _cache_mem(curr_out, prev_mem, mem_len, reuse_len = None): + """cache hidden states into memory.""" + if mem_len is None or mem_len == 0: + return None + else: + if reuse_len is not None and reuse_len > 0: + curr_out = curr_out[:reuse_len] + + if prev_mem is None: + new_mem = curr_out[-mem_len:] + else: + new_mem = tf.concat([prev_mem, curr_out], 0)[-mem_len:] + + return tf.stop_gradient(new_mem) + + +def relative_positional_encoding( + qlen, klen, d_model, clamp_len, attn_type, bi_data, bsz = None, dtype = None +): + """create relative positional encoding.""" + freq_seq = tf.range(0, d_model, 2.0) + if dtype is not None and dtype != tf.float32: + freq_seq = tf.cast(freq_seq, dtype = dtype) + inv_freq = 1 / (10000 ** (freq_seq / d_model)) + + if attn_type == 'bi': + # beg, end = klen - 1, -qlen + beg, end = klen, -qlen + elif attn_type == 'uni': + # beg, end = klen - 1, -1 + beg, end = klen, -1 + else: + raise ValueError('Unknown `attn_type` {}.'.format(attn_type)) + + if bi_data: + fwd_pos_seq = tf.range(beg, end, -1.0) + bwd_pos_seq = tf.range(-beg, -end, 1.0) + + if dtype is not None and dtype != tf.float32: + fwd_pos_seq = tf.cast(fwd_pos_seq, dtype = dtype) + bwd_pos_seq = tf.cast(bwd_pos_seq, dtype = dtype) + + if clamp_len > 0: + fwd_pos_seq = tf.clip_by_value(fwd_pos_seq, -clamp_len, clamp_len) + bwd_pos_seq = tf.clip_by_value(bwd_pos_seq, -clamp_len, clamp_len) + + if bsz is not None: + # With bi_data, the batch size should be divisible by 2. + assert bsz % 2 == 0 + fwd_pos_emb = positional_embedding(fwd_pos_seq, inv_freq, bsz // 2) + bwd_pos_emb = positional_embedding(bwd_pos_seq, inv_freq, bsz // 2) + else: + fwd_pos_emb = positional_embedding(fwd_pos_seq, inv_freq) + bwd_pos_emb = positional_embedding(bwd_pos_seq, inv_freq) + + pos_emb = tf.concat([fwd_pos_emb, bwd_pos_emb], axis = 1) + else: + fwd_pos_seq = tf.range(beg, end, -1.0) + if dtype is not None and dtype != tf.float32: + fwd_pos_seq = tf.cast(fwd_pos_seq, dtype = dtype) + if clamp_len > 0: + fwd_pos_seq = tf.clip_by_value(fwd_pos_seq, -clamp_len, clamp_len) + pos_emb = positional_embedding(fwd_pos_seq, inv_freq, bsz) + + return pos_emb + + +def multihead_attn( + q, + k, + v, + attn_mask, + d_model, + n_head, + d_head, + dropout, + dropatt, + is_training, + kernel_initializer, + residual = True, + scope = 'abs_attn', + reuse = None, +): + """Standard multi-head attention with absolute positional embedding.""" + + scale = 1 / (d_head ** 0.5) + with tf.variable_scope(scope, reuse = reuse): + # attention heads + q_head = head_projection( + q, d_model, n_head, d_head, kernel_initializer, 'q' + ) + k_head = head_projection( + k, d_model, n_head, d_head, kernel_initializer, 'k' + ) + v_head = head_projection( + v, d_model, n_head, d_head, kernel_initializer, 'v' + ) + + # attention vector + attn_vec = abs_attn_core( + q_head, k_head, v_head, attn_mask, dropatt, is_training, scale + ) + + # post processing + output = post_attention( + v, + attn_vec, + d_model, + n_head, + d_head, + dropout, + is_training, + kernel_initializer, + residual, + ) + + return output + + +def rel_multihead_attn( + h, + r, + r_w_bias, + r_r_bias, + seg_mat, + r_s_bias, + seg_embed, + attn_mask, + mems, + d_model, + n_head, + d_head, + dropout, + dropatt, + is_training, + kernel_initializer, + scope = 'rel_attn', + reuse = None, +): + """Multi-head attention with relative positional encoding.""" + + scale = 1 / (d_head ** 0.5) + with tf.variable_scope(scope, reuse = reuse): + if mems is not None and mems.shape.ndims > 1: + cat = tf.concat([mems, h], 0) + else: + cat = h + + # content heads + q_head_h = head_projection( + h, d_model, n_head, d_head, kernel_initializer, 'q' + ) + k_head_h = head_projection( + cat, d_model, n_head, d_head, kernel_initializer, 'k' + ) + v_head_h = head_projection( + cat, d_model, n_head, d_head, kernel_initializer, 'v' + ) + + # positional heads + k_head_r = head_projection( + r, d_model, n_head, d_head, kernel_initializer, 'r' + ) + + # core attention ops + attn_vec = rel_attn_core( + q_head_h, + k_head_h, + v_head_h, + k_head_r, + seg_embed, + seg_mat, + r_w_bias, + r_r_bias, + r_s_bias, + attn_mask, + dropatt, + is_training, + scale, + ) + + # post processing + output = post_attention( + h, + attn_vec, + d_model, + n_head, + d_head, + dropout, + is_training, + kernel_initializer, + ) + + return output + + +def two_stream_rel_attn( + h, + g, + r, + mems, + r_w_bias, + r_r_bias, + seg_mat, + r_s_bias, + seg_embed, + attn_mask_h, + attn_mask_g, + target_mapping, + d_model, + n_head, + d_head, + dropout, + dropatt, + is_training, + kernel_initializer, + scope = 'rel_attn', +): + """Two-stream attention with relative positional encoding.""" + + scale = 1 / (d_head ** 0.5) + with tf.variable_scope(scope, reuse = False): + + # content based attention score + if mems is not None and mems.shape.ndims > 1: + cat = tf.concat([mems, h], 0) + else: + cat = h + + # content-based key head + k_head_h = head_projection( + cat, d_model, n_head, d_head, kernel_initializer, 'k' + ) + + # content-based value head + v_head_h = head_projection( + cat, d_model, n_head, d_head, kernel_initializer, 'v' + ) + + # position-based key head + k_head_r = head_projection( + r, d_model, n_head, d_head, kernel_initializer, 'r' + ) + + ##### h-stream + # content-stream query head + q_head_h = head_projection( + h, d_model, n_head, d_head, kernel_initializer, 'q' + ) + + # core attention ops + attn_vec_h = rel_attn_core( + q_head_h, + k_head_h, + v_head_h, + k_head_r, + seg_embed, + seg_mat, + r_w_bias, + r_r_bias, + r_s_bias, + attn_mask_h, + dropatt, + is_training, + scale, + ) + + # post processing + output_h = post_attention( + h, + attn_vec_h, + d_model, + n_head, + d_head, + dropout, + is_training, + kernel_initializer, + ) + + with tf.variable_scope(scope, reuse = True): + ##### g-stream + # query-stream query head + q_head_g = head_projection( + g, d_model, n_head, d_head, kernel_initializer, 'q' + ) + + # core attention ops + if target_mapping is not None: + q_head_g = tf.einsum('mbnd,mlb->lbnd', q_head_g, target_mapping) + attn_vec_g = rel_attn_core( + q_head_g, + k_head_h, + v_head_h, + k_head_r, + seg_embed, + seg_mat, + r_w_bias, + r_r_bias, + r_s_bias, + attn_mask_g, + dropatt, + is_training, + scale, + ) + attn_vec_g = tf.einsum('lbnd,mlb->mbnd', attn_vec_g, target_mapping) + else: + attn_vec_g = rel_attn_core( + q_head_g, + k_head_h, + v_head_h, + k_head_r, + seg_embed, + seg_mat, + r_w_bias, + r_r_bias, + r_s_bias, + attn_mask_g, + dropatt, + is_training, + scale, + ) + + # post processing + output_g = post_attention( + g, + attn_vec_g, + d_model, + n_head, + d_head, + dropout, + is_training, + kernel_initializer, + ) + + return output_h, output_g + + +def transformer_xl( + inp_k, + n_token, + n_layer, + d_model, + n_head, + d_head, + d_inner, + dropout, + dropatt, + attn_type, + bi_data, + initializer, + is_training, + mem_len = None, + inp_q = None, + mems = None, + same_length = False, + clamp_len = -1, + untie_r = False, + use_tpu = True, + input_mask = None, + perm_mask = None, + seg_id = None, + reuse_len = None, + ff_activation = 'relu', + target_mapping = None, + use_bfloat16 = False, + scope = 'transformer', + **kwargs +): + """ + Defines a Transformer-XL computation graph with additional + support for XLNet. + + Args: + + inp_k: int32 Tensor in shape [len, bsz], the input token IDs. + seg_id: int32 Tensor in shape [len, bsz], the input segment IDs. + input_mask: float32 Tensor in shape [len, bsz], the input mask. + 0 for real tokens and 1 for padding. + mems: a list of float32 Tensors in shape [mem_len, bsz, d_model], memory + from previous batches. The length of the list equals n_layer. + If None, no memory is used. + perm_mask: float32 Tensor in shape [len, len, bsz]. + If perm_mask[i, j, k] = 0, i attend to j in batch k; + if perm_mask[i, j, k] = 1, i does not attend to j in batch k. + If None, each position attends to all the others. + target_mapping: float32 Tensor in shape [num_predict, len, bsz]. + If target_mapping[i, j, k] = 1, the i-th predict in batch k is + on the j-th token. + Only used during pretraining for partial prediction. + Set to None during finetuning. + inp_q: float32 Tensor in shape [len, bsz]. + 1 for tokens with losses and 0 for tokens without losses. + Only used during pretraining for two-stream attention. + Set to None during finetuning. + + n_layer: int, the number of layers. + d_model: int, the hidden size. + n_head: int, the number of attention heads. + d_head: int, the dimension size of each attention head. + d_inner: int, the hidden size in feed-forward layers. + ff_activation: str, "relu" or "gelu". + untie_r: bool, whether to untie the biases in attention. + n_token: int, the vocab size. + + is_training: bool, whether in training mode. + use_tpu: bool, whether TPUs are used. + use_bfloat16: bool, use bfloat16 instead of float32. + dropout: float, dropout rate. + dropatt: float, dropout rate on attention probabilities. + init: str, the initialization scheme, either "normal" or "uniform". + init_range: float, initialize the parameters with a uniform distribution + in [-init_range, init_range]. Only effective when init="uniform". + init_std: float, initialize the parameters with a normal distribution + with mean 0 and stddev init_std. Only effective when init="normal". + mem_len: int, the number of tokens to cache. + reuse_len: int, the number of tokens in the currect batch to be cached + and reused in the future. + bi_data: bool, whether to use bidirectional input pipeline. + Usually set to True during pretraining and False during finetuning. + clamp_len: int, clamp all relative distances larger than clamp_len. + -1 means no clamping. + same_length: bool, whether to use the same attention length for each token. + summary_type: str, "last", "first", "mean", or "attn". The method + to pool the input to get a vector representation. + initializer: A tf initializer. + scope: scope name for the computation graph. + """ + tf.logging.info('memory input {}'.format(mems)) + tf_float = tf.bfloat16 if use_bfloat16 else tf.float32 + tf.logging.info('Use float type {}'.format(tf_float)) + + new_mems = [] + with tf.variable_scope(scope): + if untie_r: + r_w_bias = tf.get_variable( + 'r_w_bias', + [n_layer, n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + r_r_bias = tf.get_variable( + 'r_r_bias', + [n_layer, n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + else: + r_w_bias = tf.get_variable( + 'r_w_bias', + [n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + r_r_bias = tf.get_variable( + 'r_r_bias', + [n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + + bsz = tf.shape(inp_k)[1] + qlen = tf.shape(inp_k)[0] + mlen = tf.shape(mems[0])[0] if mems is not None else 0 + klen = mlen + qlen + + ##### Attention mask + # causal attention mask + if attn_type == 'uni': + attn_mask = _create_mask(qlen, mlen, tf_float, same_length) + attn_mask = attn_mask[:, :, None, None] + elif attn_type == 'bi': + attn_mask = None + else: + raise ValueError('Unsupported attention type: {}'.format(attn_type)) + + # data mask: input mask & perm mask + if input_mask is not None and perm_mask is not None: + data_mask = input_mask[None] + perm_mask + elif input_mask is not None and perm_mask is None: + data_mask = input_mask[None] + elif input_mask is None and perm_mask is not None: + data_mask = perm_mask + else: + data_mask = None + + if data_mask is not None: + # all mems can be attended to + mems_mask = tf.zeros( + [tf.shape(data_mask)[0], mlen, bsz], dtype = tf_float + ) + data_mask = tf.concat([mems_mask, data_mask], 1) + if attn_mask is None: + attn_mask = data_mask[:, :, :, None] + else: + attn_mask += data_mask[:, :, :, None] + + if attn_mask is not None: + attn_mask = tf.cast(attn_mask > 0, dtype = tf_float) + + if attn_mask is not None: + non_tgt_mask = -tf.eye(qlen, dtype = tf_float) + non_tgt_mask = tf.concat( + [tf.zeros([qlen, mlen], dtype = tf_float), non_tgt_mask], + axis = -1, + ) + non_tgt_mask = tf.cast( + (attn_mask + non_tgt_mask[:, :, None, None]) > 0, + dtype = tf_float, + ) + else: + non_tgt_mask = None + + ##### Word embedding + word_emb_k, lookup_table = embedding_lookup( + x = inp_k, + n_token = n_token, + d_embed = d_model, + initializer = initializer, + use_tpu = use_tpu, + dtype = tf_float, + scope = 'word_embedding', + ) + + if inp_q is not None: + with tf.variable_scope('mask_emb'): + mask_emb = tf.get_variable( + 'mask_emb', [1, 1, d_model], dtype = tf_float + ) + if target_mapping is not None: + word_emb_q = tf.tile( + mask_emb, [tf.shape(target_mapping)[0], bsz, 1] + ) + else: + inp_q_ext = inp_q[:, :, None] + word_emb_q = ( + inp_q_ext * mask_emb + (1 - inp_q_ext) * word_emb_k + ) + output_h = tf.layers.dropout( + word_emb_k, dropout, training = is_training + ) + if inp_q is not None: + output_g = tf.layers.dropout( + word_emb_q, dropout, training = is_training + ) + + ##### Segment embedding + if seg_id is not None: + if untie_r: + r_s_bias = tf.get_variable( + 'r_s_bias', + [n_layer, n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + else: + # default case (tie) + r_s_bias = tf.get_variable( + 'r_s_bias', + [n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + + seg_embed = tf.get_variable( + 'seg_embed', + [n_layer, 2, n_head, d_head], + dtype = tf_float, + initializer = initializer, + ) + + # Convert `seg_id` to one-hot `seg_mat` + mem_pad = tf.zeros([mlen, bsz], dtype = tf.int32) + cat_ids = tf.concat([mem_pad, seg_id], 0) + + # `1` indicates not in the same segment [qlen x klen x bsz] + seg_mat = tf.cast( + tf.logical_not(tf.equal(seg_id[:, None], cat_ids[None, :])), + tf.int32, + ) + seg_mat = tf.one_hot(seg_mat, 2, dtype = tf_float) + else: + seg_mat = None + + ##### Positional encoding + pos_emb = relative_positional_encoding( + qlen, + klen, + d_model, + clamp_len, + attn_type, + bi_data, + bsz = bsz, + dtype = tf_float, + ) + pos_emb = tf.layers.dropout(pos_emb, dropout, training = is_training) + + ##### Attention layers + if mems is None: + mems = [None] * n_layer + + for i in range(n_layer): + # cache new mems + new_mems.append(_cache_mem(output_h, mems[i], mem_len, reuse_len)) + + # segment bias + if seg_id is None: + r_s_bias_i = None + seg_embed_i = None + else: + r_s_bias_i = r_s_bias if not untie_r else r_s_bias[i] + seg_embed_i = seg_embed[i] + + with tf.variable_scope('layer_{}'.format(i)): + if inp_q is not None: + output_h, output_g = two_stream_rel_attn( + h = output_h, + g = output_g, + r = pos_emb, + r_w_bias = r_w_bias if not untie_r else r_w_bias[i], + r_r_bias = r_r_bias if not untie_r else r_r_bias[i], + seg_mat = seg_mat, + r_s_bias = r_s_bias_i, + seg_embed = seg_embed_i, + attn_mask_h = non_tgt_mask, + attn_mask_g = attn_mask, + mems = mems[i], + target_mapping = target_mapping, + d_model = d_model, + n_head = n_head, + d_head = d_head, + dropout = dropout, + dropatt = dropatt, + is_training = is_training, + kernel_initializer = initializer, + ) + reuse = True + else: + reuse = False + + output_h = rel_multihead_attn( + h = output_h, + r = pos_emb, + r_w_bias = r_w_bias if not untie_r else r_w_bias[i], + r_r_bias = r_r_bias if not untie_r else r_r_bias[i], + seg_mat = seg_mat, + r_s_bias = r_s_bias_i, + seg_embed = seg_embed_i, + attn_mask = non_tgt_mask, + mems = mems[i], + d_model = d_model, + n_head = n_head, + d_head = d_head, + dropout = dropout, + dropatt = dropatt, + is_training = is_training, + kernel_initializer = initializer, + reuse = reuse, + ) + + if inp_q is not None: + output_g = positionwise_ffn( + inp = output_g, + d_model = d_model, + d_inner = d_inner, + dropout = dropout, + kernel_initializer = initializer, + activation_type = ff_activation, + is_training = is_training, + ) + + output_h = positionwise_ffn( + inp = output_h, + d_model = d_model, + d_inner = d_inner, + dropout = dropout, + kernel_initializer = initializer, + activation_type = ff_activation, + is_training = is_training, + reuse = reuse, + ) + + if inp_q is not None: + output = tf.layers.dropout( + output_g, dropout, training = is_training + ) + else: + output = tf.layers.dropout( + output_h, dropout, training = is_training + ) + + return output, new_mems, lookup_table + + +def lm_loss( + hidden, + target, + n_token, + d_model, + initializer, + lookup_table = None, + tie_weight = False, + bi_data = True, + use_tpu = False, +): + """doc.""" + + with tf.variable_scope('lm_loss'): + if tie_weight: + assert ( + lookup_table is not None + ), 'lookup_table cannot be None for tie_weight' + softmax_w = lookup_table + else: + softmax_w = tf.get_variable( + 'weight', + [n_token, d_model], + dtype = hidden.dtype, + initializer = initializer, + ) + + softmax_b = tf.get_variable( + 'bias', + [n_token], + dtype = hidden.dtype, + initializer = tf.zeros_initializer(), + ) + + logits = tf.einsum('ibd,nd->ibn', hidden, softmax_w) + softmax_b + + if use_tpu: + one_hot_target = tf.one_hot(target, n_token, dtype = logits.dtype) + loss = -tf.reduce_sum( + tf.nn.log_softmax(logits) * one_hot_target, -1 + ) + else: + loss = tf.nn.sparse_softmax_cross_entropy_with_logits( + labels = target, logits = logits + ) + + return loss + + +def summarize_sequence( + summary_type, + hidden, + d_model, + n_head, + d_head, + dropout, + dropatt, + input_mask, + is_training, + initializer, + scope = None, + reuse = None, + use_proj = True, +): + + """ + Different classification tasks may not may not share the same parameters + to summarize the sequence features. + + If shared, one can keep the `scope` to the default value `None`. + Otherwise, one should specify a different `scope` for each task. + """ + + with tf.variable_scope(scope, 'sequnece_summary', reuse = reuse): + if summary_type == 'last': + summary = hidden[-1] + elif summary_type == 'first': + summary = hidden[0] + elif summary_type == 'mean': + summary = tf.reduce_mean(hidden, axis = 0) + elif summary_type == 'attn': + bsz = tf.shape(hidden)[1] + + summary_bias = tf.get_variable( + 'summary_bias', + [d_model], + dtype = hidden.dtype, + initializer = initializer, + ) + summary_bias = tf.tile(summary_bias[None, None], [1, bsz, 1]) + + if input_mask is not None: + input_mask = input_mask[None, :, :, None] + + summary = multihead_attn( + summary_bias, + hidden, + hidden, + input_mask, + d_model, + n_head, + d_head, + dropout, + dropatt, + is_training, + initializer, + residual = False, + ) + summary = summary[0] + else: + raise ValueError('Unsupported summary type {}'.format(summary_type)) + + # use another projection as in BERT + if use_proj: + summary = tf.layers.dense( + summary, + d_model, + activation = tf.tanh, + kernel_initializer = initializer, + name = 'summary', + ) + + # dropout + summary = tf.layers.dropout( + summary, dropout, training = is_training, name = 'dropout' + ) + + return summary + + +def classification_loss( + hidden, + labels, + n_class, + initializer, + scope, + reuse = None, + return_logits = False, +): + """ + Different classification tasks should use different scope names to ensure + different dense layers (parameters) are used to produce the logits. + + An exception will be in transfer learning, where one hopes to transfer + the classification weights. + """ + + with tf.variable_scope(scope, reuse = reuse): + logits = tf.layers.dense( + hidden, n_class, kernel_initializer = initializer, name = 'logit' + ) + + one_hot_target = tf.one_hot(labels, n_class, dtype = hidden.dtype) + loss = -tf.reduce_sum(tf.nn.log_softmax(logits) * one_hot_target, -1) + + if return_logits: + return loss, logits + + return loss + + +def regression_loss( + hidden, labels, initializer, scope, reuse = None, return_logits = False +): + with tf.variable_scope(scope, reuse = reuse): + logits = tf.layers.dense( + hidden, 1, kernel_initializer = initializer, name = 'logit' + ) + + logits = tf.squeeze(logits, axis = -1) + loss = tf.square(logits - labels) + + if return_logits: + return loss, logits + + return loss diff --git a/topic-model/prepro_utils.py b/topic-model/prepro_utils.py new file mode 100644 index 0000000..d3e4195 --- /dev/null +++ b/topic-model/prepro_utils.py @@ -0,0 +1,164 @@ +# coding=utf-8 +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import unicodedata +import six +from functools import partial + + +SPIECE_UNDERLINE = '▁' + + +def printable_text(text): + """Returns text encoded in a way suitable for print or `tf.logging`.""" + + # These functions want `str` for both Python2 and Python3, but in one case + # it's a Unicode string and in the other it's a byte string. + if six.PY3: + if isinstance(text, str): + return text + elif isinstance(text, bytes): + return text.decode('utf-8', 'ignore') + else: + raise ValueError('Unsupported string type: %s' % (type(text))) + elif six.PY2: + if isinstance(text, str): + return text + elif isinstance(text, unicode): + return text.encode('utf-8') + else: + raise ValueError('Unsupported string type: %s' % (type(text))) + else: + raise ValueError('Not running on Python2 or Python 3?') + + +def print_(*args): + new_args = [] + for arg in args: + if isinstance(arg, list): + s = [printable_text(i) for i in arg] + s = ' '.join(s) + new_args.append(s) + else: + new_args.append(printable_text(arg)) + print(*new_args) + + +def preprocess_text( + inputs, lower = False, remove_space = True, keep_accents = False +): + if remove_space: + outputs = ' '.join(inputs.strip().split()) + else: + outputs = inputs + outputs = outputs.replace('``', '"').replace("''", '"') + + if six.PY2 and isinstance(outputs, str): + outputs = outputs.decode('utf-8') + + if not keep_accents: + outputs = unicodedata.normalize('NFKD', outputs) + outputs = ''.join([c for c in outputs if not unicodedata.combining(c)]) + if lower: + outputs = outputs.lower() + + return outputs + + +def encode_pieces(sp_model, text, return_unicode = True, sample = False): + # return_unicode is used only for py2 + + # note(zhiliny): in some systems, sentencepiece only accepts str for py2 + if six.PY2 and isinstance(text, unicode): + text = text.encode('utf-8') + + if not sample: + pieces = sp_model.EncodeAsPieces(text) + else: + pieces = sp_model.SampleEncodeAsPieces(text, 64, 0.1) + new_pieces = [] + for piece in pieces: + if len(piece) > 1 and piece[-1] == ',' and piece[-2].isdigit(): + cur_pieces = sp_model.EncodeAsPieces( + piece[:-1].replace(SPIECE_UNDERLINE, '') + ) + if ( + piece[0] != SPIECE_UNDERLINE + and cur_pieces[0][0] == SPIECE_UNDERLINE + ): + if len(cur_pieces[0]) == 1: + cur_pieces = cur_pieces[1:] + else: + cur_pieces[0] = cur_pieces[0][1:] + cur_pieces.append(piece[-1]) + new_pieces.extend(cur_pieces) + else: + new_pieces.append(piece) + + # note(zhiliny): convert back to unicode for py2 + if six.PY2 and return_unicode: + ret_pieces = [] + for piece in new_pieces: + if isinstance(piece, str): + piece = piece.decode('utf-8') + ret_pieces.append(piece) + new_pieces = ret_pieces + + return new_pieces + + +def encode_ids(sp_model, text, sample = False): + pieces = encode_pieces( + sp_model, text, return_unicode = False, sample = sample + ) + ids = [sp_model.PieceToId(piece) for piece in pieces] + return ids + + +if __name__ == '__main__': + import sentencepiece as spm + + sp = spm.SentencePieceProcessor() + sp.load('sp10m.uncased.v3.model') + + print_(u'I was born in 2000, and this is falsé.') + print_( + u'ORIGINAL', + sp.EncodeAsPieces(u'I was born in 2000, and this is falsé.'), + ) + print_( + u'OURS', encode_pieces(sp, u'I was born in 2000, and this is falsé.') + ) + print(encode_ids(sp, u'I was born in 2000, and this is falsé.')) + print_('') + prepro_func = partial(preprocess_text, lower = True) + print_(prepro_func('I was born in 2000, and this is falsé.')) + print_( + 'ORIGINAL', + sp.EncodeAsPieces( + prepro_func('I was born in 2000, and this is falsé.') + ), + ) + print_( + 'OURS', + encode_pieces( + sp, prepro_func('I was born in 2000, and this is falsé.') + ), + ) + print(encode_ids(sp, prepro_func('I was born in 2000, and this is falsé.'))) + print_('') + print_('I was born in 2000, and this is falsé.') + print_( + 'ORIGINAL', sp.EncodeAsPieces('I was born in 2000, and this is falsé.') + ) + print_('OURS', encode_pieces(sp, 'I was born in 2000, and this is falsé.')) + print(encode_ids(sp, 'I was born in 2000, and this is falsé.')) + print_('') + print_('I was born in 92000, and this is falsé.') + print_( + 'ORIGINAL', sp.EncodeAsPieces('I was born in 92000, and this is falsé.') + ) + print_('OURS', encode_pieces(sp, 'I was born in 92000, and this is falsé.')) + print(encode_ids(sp, 'I was born in 92000, and this is falsé.')) diff --git a/embedded/utils.py b/topic-model/utils.py similarity index 100% rename from embedded/utils.py rename to topic-model/utils.py diff --git a/topic-model/xlnet.py b/topic-model/xlnet.py new file mode 100644 index 0000000..6e49589 --- /dev/null +++ b/topic-model/xlnet.py @@ -0,0 +1,328 @@ +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import json +import os +import tensorflow as tf +import modeling + + +def _get_initializer(FLAGS): + """Get variable intializer.""" + if FLAGS.init == 'uniform': + initializer = tf.initializers.random_uniform( + minval = -FLAGS.init_range, maxval = FLAGS.init_range, seed = None + ) + elif FLAGS.init == 'normal': + initializer = tf.initializers.random_normal( + stddev = FLAGS.init_std, seed = None + ) + else: + raise ValueError('Initializer {} not supported'.format(FLAGS.init)) + return initializer + + +class XLNetConfig(object): + """XLNetConfig contains hyperparameters that are specific to a model checkpoint; + i.e., these hyperparameters should be the same between + pretraining and finetuning. + + The following hyperparameters are defined: + n_layer: int, the number of layers. + d_model: int, the hidden size. + n_head: int, the number of attention heads. + d_head: int, the dimension size of each attention head. + d_inner: int, the hidden size in feed-forward layers. + ff_activation: str, "relu" or "gelu". + untie_r: bool, whether to untie the biases in attention. + n_token: int, the vocab size. + """ + + def __init__(self, FLAGS = None, json_path = None): + """Constructing an XLNetConfig. + One of FLAGS or json_path should be provided.""" + + assert FLAGS is not None or json_path is not None + + self.keys = [ + 'n_layer', + 'd_model', + 'n_head', + 'd_head', + 'd_inner', + 'ff_activation', + 'untie_r', + 'n_token', + ] + + if FLAGS is not None: + self.init_from_flags(FLAGS) + + if json_path is not None: + self.init_from_json(json_path) + + def init_from_flags(self, FLAGS): + for key in self.keys: + setattr(self, key, getattr(FLAGS, key)) + + def init_from_json(self, json_path): + with tf.gfile.Open(json_path) as f: + json_data = json.load(f) + for key in self.keys: + setattr(self, key, json_data[key]) + + def to_json(self, json_path): + """Save XLNetConfig to a json file.""" + json_data = {} + for key in self.keys: + json_data[key] = getattr(self, key) + + json_dir = os.path.dirname(json_path) + if not tf.gfile.Exists(json_dir): + tf.gfile.MakeDirs(json_dir) + with tf.gfile.Open(json_path, 'w') as f: + json.dump(json_data, f, indent = 4, sort_keys = True) + + +def create_run_config(is_training, is_finetune, FLAGS): + kwargs = dict( + is_training = is_training, + use_tpu = FLAGS.use_tpu, + use_bfloat16 = FLAGS.use_bfloat16, + dropout = FLAGS.dropout, + dropatt = FLAGS.dropatt, + init = FLAGS.init, + init_range = FLAGS.init_range, + init_std = FLAGS.init_std, + clamp_len = FLAGS.clamp_len, + ) + + if not is_finetune: + kwargs.update( + dict( + mem_len = FLAGS.mem_len, + reuse_len = FLAGS.reuse_len, + bi_data = FLAGS.bi_data, + clamp_len = FLAGS.clamp_len, + same_length = FLAGS.same_length, + ) + ) + + return RunConfig(**kwargs) + + +class RunConfig(object): + """RunConfig contains hyperparameters that could be different + between pretraining and finetuning. + These hyperparameters can also be changed from run to run. + We store them separately from XLNetConfig for flexibility. + """ + + def __init__( + self, + is_training, + use_tpu, + use_bfloat16, + dropout, + dropatt, + init = 'normal', + init_range = 0.1, + init_std = 0.02, + mem_len = None, + reuse_len = None, + bi_data = False, + clamp_len = -1, + same_length = False, + ): + """ + Args: + is_training: bool, whether in training mode. + use_tpu: bool, whether TPUs are used. + use_bfloat16: bool, use bfloat16 instead of float32. + dropout: float, dropout rate. + dropatt: float, dropout rate on attention probabilities. + init: str, the initialization scheme, either "normal" or "uniform". + init_range: float, initialize the parameters with a uniform distribution + in [-init_range, init_range]. Only effective when init="uniform". + init_std: float, initialize the parameters with a normal distribution + with mean 0 and stddev init_std. Only effective when init="normal". + mem_len: int, the number of tokens to cache. + reuse_len: int, the number of tokens in the currect batch to be cached + and reused in the future. + bi_data: bool, whether to use bidirectional input pipeline. + Usually set to True during pretraining and False during finetuning. + clamp_len: int, clamp all relative distances larger than clamp_len. + -1 means no clamping. + same_length: bool, whether to use the same attention length for each token. + """ + + self.init = init + self.init_range = init_range + self.init_std = init_std + self.is_training = is_training + self.dropout = dropout + self.dropatt = dropatt + self.use_tpu = use_tpu + self.use_bfloat16 = use_bfloat16 + self.mem_len = mem_len + self.reuse_len = reuse_len + self.bi_data = bi_data + self.clamp_len = clamp_len + self.same_length = same_length + + +class XLNetModel(object): + """A wrapper of the XLNet model used during both pretraining and finetuning.""" + + def __init__( + self, + xlnet_config, + run_config, + input_ids, + seg_ids, + input_mask, + mems = None, + perm_mask = None, + target_mapping = None, + inp_q = None, + **kwargs + ): + """ + Args: + xlnet_config: XLNetConfig, + run_config: RunConfig, + input_ids: int32 Tensor in shape [len, bsz], the input token IDs. + seg_ids: int32 Tensor in shape [len, bsz], the input segment IDs. + input_mask: float32 Tensor in shape [len, bsz], the input mask. + 0 for real tokens and 1 for padding. + mems: a list of float32 Tensors in shape [mem_len, bsz, d_model], memory + from previous batches. The length of the list equals n_layer. + If None, no memory is used. + perm_mask: float32 Tensor in shape [len, len, bsz]. + If perm_mask[i, j, k] = 0, i attend to j in batch k; + if perm_mask[i, j, k] = 1, i does not attend to j in batch k. + If None, each position attends to all the others. + target_mapping: float32 Tensor in shape [num_predict, len, bsz]. + If target_mapping[i, j, k] = 1, the i-th predict in batch k is + on the j-th token. + Only used during pretraining for partial prediction. + Set to None during finetuning. + inp_q: float32 Tensor in shape [len, bsz]. + 1 for tokens with losses and 0 for tokens without losses. + Only used during pretraining for two-stream attention. + Set to None during finetuning. + """ + + initializer = _get_initializer(run_config) + + tfm_args = dict( + n_token = xlnet_config.n_token, + initializer = initializer, + attn_type = 'bi', + n_layer = xlnet_config.n_layer, + d_model = xlnet_config.d_model, + n_head = xlnet_config.n_head, + d_head = xlnet_config.d_head, + d_inner = xlnet_config.d_inner, + ff_activation = xlnet_config.ff_activation, + untie_r = xlnet_config.untie_r, + is_training = run_config.is_training, + use_bfloat16 = run_config.use_bfloat16, + use_tpu = run_config.use_tpu, + dropout = run_config.dropout, + dropatt = run_config.dropatt, + mem_len = run_config.mem_len, + reuse_len = run_config.reuse_len, + bi_data = run_config.bi_data, + clamp_len = run_config.clamp_len, + same_length = run_config.same_length, + ) + + input_args = dict( + inp_k = input_ids, + seg_id = seg_ids, + input_mask = input_mask, + mems = mems, + perm_mask = perm_mask, + target_mapping = target_mapping, + inp_q = inp_q, + ) + tfm_args.update(input_args) + + with tf.variable_scope('model', reuse = tf.AUTO_REUSE): + ( + self.output, + self.new_mems, + self.lookup_table, + ) = modeling.transformer_xl(**tfm_args) + + self.input_mask = input_mask + self.initializer = initializer + self.xlnet_config = xlnet_config + self.run_config = run_config + + def get_pooled_out(self, summary_type, use_summ_proj = True): + """ + Args: + summary_type: str, "last", "first", "mean", or "attn". The method + to pool the input to get a vector representation. + use_summ_proj: bool, whether to use a linear projection during pooling. + + Returns: + float32 Tensor in shape [bsz, d_model], the pooled representation. + """ + + xlnet_config = self.xlnet_config + run_config = self.run_config + + with tf.variable_scope('model', reuse = tf.AUTO_REUSE): + summary = modeling.summarize_sequence( + summary_type = summary_type, + hidden = self.output, + d_model = xlnet_config.d_model, + n_head = xlnet_config.n_head, + d_head = xlnet_config.d_head, + dropout = run_config.dropout, + dropatt = run_config.dropatt, + is_training = run_config.is_training, + input_mask = self.input_mask, + initializer = self.initializer, + use_proj = use_summ_proj, + ) + + return summary + + def get_sequence_output(self): + """ + Returns: + float32 Tensor in shape [len, bsz, d_model]. The last layer hidden + representation of XLNet. + """ + + return self.output + + def get_new_memory(self): + """ + Returns: + list of float32 Tensors in shape [mem_len, bsz, d_model], the new + memory that concatenates the previous memory with the current input + representations. + The length of the list equals n_layer. + """ + return self.new_mems + + def get_embedding_table(self): + """ + Returns: + float32 Tensor in shape [n_token, d_model]. The embedding lookup table. + Used for tying embeddings between input and output layers. + """ + return self.lookup_table + + def get_initializer(self): + """ + Returns: + A tf initializer. Used to initialize variables in layers on top of XLNet. + """ + return self.initializer diff --git a/extractive-summarization/1.skip-thought.ipynb b/unsupervised-extractive-summarization/1.skip-thought.ipynb similarity index 100% rename from extractive-summarization/1.skip-thought.ipynb rename to unsupervised-extractive-summarization/1.skip-thought.ipynb diff --git a/extractive-summarization/2.residual-network.ipynb b/unsupervised-extractive-summarization/2.residual-network.ipynb similarity index 100% rename from extractive-summarization/2.residual-network.ipynb rename to unsupervised-extractive-summarization/2.residual-network.ipynb diff --git a/extractive-summarization/3.residual-network-bahdanau.ipynb b/unsupervised-extractive-summarization/3.residual-network-bahdanau.ipynb similarity index 100% rename from extractive-summarization/3.residual-network-bahdanau.ipynb rename to unsupervised-extractive-summarization/3.residual-network-bahdanau.ipynb diff --git a/extractive-summarization/README.md b/unsupervised-extractive-summarization/README.md similarity index 100% rename from extractive-summarization/README.md rename to unsupervised-extractive-summarization/README.md diff --git a/extractive-summarization/books/Blood_Born b/unsupervised-extractive-summarization/books/Blood_Born similarity index 100% rename from extractive-summarization/books/Blood_Born rename to unsupervised-extractive-summarization/books/Blood_Born diff --git a/extractive-summarization/books/Dark_Thirst b/unsupervised-extractive-summarization/books/Dark_Thirst similarity index 100% rename from extractive-summarization/books/Dark_Thirst rename to unsupervised-extractive-summarization/books/Dark_Thirst diff --git a/extractive-summarization/books/Driftas_Quest b/unsupervised-extractive-summarization/books/Driftas_Quest similarity index 100% rename from extractive-summarization/books/Driftas_Quest rename to unsupervised-extractive-summarization/books/Driftas_Quest diff --git a/embedded/1.cbow-softmax.ipynb b/vectorizer/1.cbow-softmax.ipynb similarity index 100% rename from embedded/1.cbow-softmax.ipynb rename to vectorizer/1.cbow-softmax.ipynb diff --git a/embedded/10.fast-text.ipynb b/vectorizer/10.fast-text.ipynb similarity index 100% rename from embedded/10.fast-text.ipynb rename to vectorizer/10.fast-text.ipynb diff --git a/embedded/11.elmo.ipynb b/vectorizer/11.elmo.ipynb similarity index 100% rename from embedded/11.elmo.ipynb rename to vectorizer/11.elmo.ipynb diff --git a/embedded/12.bert-batch-all-triplet-loss.ipynb b/vectorizer/12.bert-batch-all-triplet-loss.ipynb similarity index 100% rename from embedded/12.bert-batch-all-triplet-loss.ipynb rename to vectorizer/12.bert-batch-all-triplet-loss.ipynb diff --git a/embedded/2.cbow-nce.ipynb b/vectorizer/2.cbow-nce.ipynb similarity index 100% rename from embedded/2.cbow-nce.ipynb rename to vectorizer/2.cbow-nce.ipynb diff --git a/embedded/3.skipgram-softmax.ipynb b/vectorizer/3.skipgram-softmax.ipynb similarity index 100% rename from embedded/3.skipgram-softmax.ipynb rename to vectorizer/3.skipgram-softmax.ipynb diff --git a/embedded/4.skipgram-nce.ipynb b/vectorizer/4.skipgram-nce.ipynb similarity index 100% rename from embedded/4.skipgram-nce.ipynb rename to vectorizer/4.skipgram-nce.ipynb diff --git a/vectorizer/5.lda2vec.ipynb b/vectorizer/5.lda2vec.ipynb new file mode 100644 index 0000000..4396899 --- /dev/null +++ b/vectorizer/5.lda2vec.ipynb @@ -0,0 +1,544 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import *\n", + "import tensorflow as tf\n", + "from collections import Counter\n", + "from sklearn.feature_extraction.text import CountVectorizer" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['negative', 'positive']\n", + "10662\n", + "10662\n" + ] + } + ], + "source": [ + "trainset = sklearn.datasets.load_files(container_path = 'data', encoding = 'UTF-8')\n", + "trainset.data, trainset.target = separate_dataset(trainset,1.0)\n", + "print(trainset.target_names)\n", + "print(len(trainset.data))\n", + "print(len(trainset.target))" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "window_size = 2\n", + "n_topics = 10\n", + "embedding_size = 128\n", + "epoch = 5\n", + "switch_loss = 3" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "class LDA2VEC:\n", + " def __init__(\n", + " self,\n", + " num_unique_documents,\n", + " vocab_size,\n", + " num_topics,\n", + " freqs,\n", + " embedding_size = 128,\n", + " num_sampled = 40,\n", + " learning_rate = 1e-3,\n", + " lmbda = 150.0,\n", + " alpha = None,\n", + " power = 0.75,\n", + " batch_size = 32,\n", + " clip_gradients = 5.0,\n", + " **kwargs\n", + " ):\n", + " moving_avgs = tf.train.ExponentialMovingAverage(0.9)\n", + " self.batch_size = batch_size\n", + " self.freqs = freqs\n", + " self.sess = tf.InteractiveSession()\n", + "\n", + " self.X = tf.placeholder(tf.int32, shape = [None])\n", + " self.Y = tf.placeholder(tf.int64, shape = [None])\n", + " self.DOC = tf.placeholder(tf.int32, shape = [None])\n", + " step = tf.Variable(0, trainable = False, name = 'global_step')\n", + " self.switch_loss = tf.Variable(0, trainable = False)\n", + " train_labels = tf.reshape(self.Y, [-1, 1])\n", + " sampler = tf.nn.fixed_unigram_candidate_sampler(\n", + " train_labels,\n", + " num_true = 1,\n", + " num_sampled = num_sampled,\n", + " unique = True,\n", + " range_max = vocab_size,\n", + " distortion = power,\n", + " unigrams = self.freqs,\n", + " )\n", + "\n", + " self.word_embedding = tf.Variable(\n", + " tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0)\n", + " )\n", + " self.nce_weights = tf.Variable(\n", + " tf.truncated_normal(\n", + " [vocab_size, embedding_size],\n", + " stddev = tf.sqrt(1 / embedding_size),\n", + " )\n", + " )\n", + " self.nce_biases = tf.Variable(tf.zeros([vocab_size]))\n", + " scalar = 1 / np.sqrt(num_unique_documents + num_topics)\n", + " self.doc_embedding = tf.Variable(\n", + " tf.random_normal(\n", + " [num_unique_documents, num_topics],\n", + " mean = 0,\n", + " stddev = 50 * scalar,\n", + " )\n", + " )\n", + " self.topic_embedding = tf.get_variable(\n", + " 'topic_embedding',\n", + " shape = [num_topics, embedding_size],\n", + " dtype = tf.float32,\n", + " initializer = tf.orthogonal_initializer(gain = scalar),\n", + " )\n", + " pivot = tf.nn.embedding_lookup(self.word_embedding, self.X)\n", + " proportions = tf.nn.embedding_lookup(self.doc_embedding, self.DOC)\n", + " doc = tf.matmul(proportions, self.topic_embedding)\n", + " doc_context = doc\n", + " word_context = pivot\n", + " context = tf.add(word_context, doc_context)\n", + " loss_word2vec = tf.reduce_mean(\n", + " tf.nn.nce_loss(\n", + " weights = self.nce_weights,\n", + " biases = self.nce_biases,\n", + " labels = self.Y,\n", + " inputs = context,\n", + " num_sampled = num_sampled,\n", + " num_classes = vocab_size,\n", + " num_true = 1,\n", + " sampled_values = sampler,\n", + " )\n", + " )\n", + " self.fraction = tf.Variable(1, trainable = False, dtype = tf.float32)\n", + "\n", + " n_topics = self.doc_embedding.get_shape()[1].value\n", + " log_proportions = tf.nn.log_softmax(self.doc_embedding)\n", + " if alpha is None:\n", + " alpha = 1.0 / n_topics\n", + " loss = -(alpha - 1) * log_proportions\n", + " prior = tf.reduce_sum(loss)\n", + "\n", + " loss_lda = lmbda * self.fraction * prior\n", + " self.cost = tf.cond(\n", + " step < self.switch_loss,\n", + " lambda: loss_word2vec,\n", + " lambda: loss_word2vec + loss_lda,\n", + " )\n", + " loss_avgs_op = moving_avgs.apply([loss_lda, loss_word2vec, self.cost])\n", + " with tf.control_dependencies([loss_avgs_op]):\n", + " self.optimizer = tf.contrib.layers.optimize_loss(\n", + " self.cost,\n", + " tf.train.get_global_step(),\n", + " learning_rate,\n", + " 'Adam',\n", + " clip_gradients = clip_gradients,\n", + " )\n", + " self.sess.run(tf.global_variables_initializer())\n", + "\n", + " def train(\n", + " self, pivot_words, target_words, doc_ids, num_epochs, switch_loss = 3\n", + " ):\n", + " from tqdm import tqdm\n", + "\n", + " temp_fraction = self.batch_size / len(pivot_words)\n", + " self.sess.run(tf.assign(self.fraction, temp_fraction))\n", + " self.sess.run(tf.assign(self.switch_loss, switch_loss))\n", + " for e in range(num_epochs):\n", + " pbar = tqdm(\n", + " range(0, len(pivot_words), self.batch_size),\n", + " desc = 'minibatch loop',\n", + " )\n", + " for i in pbar:\n", + " batch_x = pivot_words[\n", + " i : min(i + self.batch_size, len(pivot_words))\n", + " ]\n", + " batch_y = target_words[\n", + " i : min(i + self.batch_size, len(pivot_words))\n", + " ]\n", + " batch_doc = doc_ids[\n", + " i : min(i + self.batch_size, len(pivot_words))\n", + " ]\n", + " _, cost = self.sess.run(\n", + " [self.optimizer, self.cost],\n", + " feed_dict = {\n", + " self.X: batch_x,\n", + " self.Y: batch_y,\n", + " self.DOC: batch_doc,\n", + " },\n", + " )\n", + " pbar.set_postfix(cost = cost, epoch = e + 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "from sklearn.utils import shuffle\n", + "\n", + "def skipgrams(\n", + " sequence,\n", + " vocabulary_size,\n", + " window_size = 4,\n", + " negative_samples = 1.0,\n", + " shuffle = True,\n", + " categorical = False,\n", + " sampling_table = None,\n", + " seed = None,\n", + "):\n", + " couples = []\n", + " labels = []\n", + " for i, wi in enumerate(sequence):\n", + " if not wi:\n", + " continue\n", + " if sampling_table is not None:\n", + " if sampling_table[wi] < random.random():\n", + " continue\n", + "\n", + " window_start = max(0, i - window_size)\n", + " window_end = min(len(sequence), i + window_size + 1)\n", + " for j in range(window_start, window_end):\n", + " if j != i:\n", + " wj = sequence[j]\n", + " if not wj:\n", + " continue\n", + " couples.append([wi, wj])\n", + " if categorical:\n", + " labels.append([0, 1])\n", + " else:\n", + " labels.append(1)\n", + "\n", + " if negative_samples > 0:\n", + " num_negative_samples = int(len(labels) * negative_samples)\n", + " words = [c[0] for c in couples]\n", + " random.shuffle(words)\n", + "\n", + " couples += [\n", + " [words[i % len(words)], random.randint(1, vocabulary_size - 1)]\n", + " for i in range(num_negative_samples)\n", + " ]\n", + " if categorical:\n", + " labels += [[1, 0]] * num_negative_samples\n", + " else:\n", + " labels += [0] * num_negative_samples\n", + "\n", + " if shuffle:\n", + " if seed is None:\n", + " seed = random.randint(0, 10e6)\n", + " random.seed(seed)\n", + " random.shuffle(couples)\n", + " random.seed(seed)\n", + " random.shuffle(labels)\n", + "\n", + " return couples, labels" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "bow = CountVectorizer().fit(trainset.data)\n", + "transformed = bow.transform(trainset.data)\n", + "idx_text_clean, len_idx_text_clean = [], []\n", + "for text in transformed:\n", + " splitted = text.nonzero()[1]\n", + " idx_text_clean.append(splitted)\n", + " \n", + "dictionary = {\n", + " i: no for no, i in enumerate(bow.get_feature_names())\n", + " }\n", + "reversed_dictionary = {\n", + " no: i for no, i in enumerate(bow.get_feature_names())\n", + " }\n", + "freqs = transformed.toarray().sum(axis = 0).tolist()\n", + "doc_ids = np.arange(len(idx_text_clean))\n", + "num_unique_documents = doc_ids.max()\n", + "pivot_words, target_words, doc_ids = [], [], []\n", + "for i, t in enumerate(idx_text_clean):\n", + " pairs, _ = skipgrams(\n", + " t,\n", + " vocabulary_size = len(dictionary),\n", + " window_size = window_size,\n", + " shuffle = True,\n", + " negative_samples = 0,\n", + " )\n", + " for pair in pairs:\n", + " temp_data = pair\n", + " pivot_words.append(temp_data[0])\n", + " target_words.append(temp_data[1])\n", + " doc_ids.append(i)\n", + "pivot_words, target_words, doc_ids = shuffle(\n", + " pivot_words, target_words, doc_ids, random_state = 10\n", + ")\n", + "num_unique_documents = len(idx_text_clean)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "model = LDA2VEC(\n", + " num_unique_documents,\n", + " len(dictionary),\n", + " n_topics,\n", + " freqs,\n", + " embedding_size = embedding_size)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "minibatch loop: 100%|██████████| 11987/11987 [01:40<00:00, 118.55it/s, cost=-1.85e+4, epoch=1]\n", + "minibatch loop: 100%|██████████| 11987/11987 [01:40<00:00, 119.28it/s, cost=-4.34e+4, epoch=2]\n", + "minibatch loop: 100%|██████████| 11987/11987 [01:40<00:00, 119.51it/s, cost=-6.76e+4, epoch=3]\n", + "minibatch loop: 100%|██████████| 11987/11987 [01:40<00:00, 119.69it/s, cost=-9.17e+4, epoch=4]\n", + "minibatch loop: 100%|██████████| 11987/11987 [01:40<00:00, 119.71it/s, cost=-1.16e+5, epoch=5]\n" + ] + } + ], + "source": [ + "model.train(\n", + " pivot_words, target_words, doc_ids, epoch, switch_loss = switch_loss\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "doc_embed = model.sess.run(model.doc_embedding)\n", + "topic_embed = model.sess.run(model.topic_embedding)\n", + "word_embed = model.sess.run(model.word_embedding)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((10662, 10), (10, 128), (20306, 128))" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "doc_embed.shape, topic_embed.shape, word_embed.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from scipy.spatial.distance import cdist\n", + "from sklearn.neighbors import NearestNeighbors" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[['friend', 0.44515836238861084],\n", + " ['capture', 0.4315608739852905],\n", + " ['american', 0.41776609420776367],\n", + " ['art', 0.41060054302215576],\n", + " ['cashin', 0.4037929177284241],\n", + " ['awkwardly', 0.40309447050094604],\n", + " ['gifted', 0.4017599821090698],\n", + " ['brisk', 0.397861123085022],\n", + " ['come', 0.38802099227905273]]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word = 'beautiful'\n", + "nn = NearestNeighbors(10, metric = 'cosine').fit(word_embed)\n", + "distances, idx = nn.kneighbors(word_embed[dictionary[word]].reshape((1, -1)))\n", + "word_list = []\n", + "for i in range(1, idx.shape[1]):\n", + " word_list.append([reversed_dictionary[idx[0, i]], 1 - distances[0, i]])\n", + "word_list" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "topic 1 : perch imaxy regions addessi astronauts seacoast fisk divining wellnigh\n", + "topic 2 : bitchy confirming worldview terrorists clich diatribe filter exquisitely billy\n", + "topic 3 : livingroom marivauxs privates plunges andespeciallyto establishing association emi auteils\n", + "topic 4 : unfathomable utilizing twinkling tuna unbroken bigwave capitalizes awards leash\n", + "topic 5 : senior nicholson massoud molto widen disgracefully racked bearing organize\n", + "topic 6 : brats clutchy utilizing versace upfront andie sterotypes triplecrosses shecute\n", + "topic 7 : auteils buoy bastards gobble upfront transforma victimized pre911 sidewalks\n", + "topic 8 : smokey barrie venomous trudge arguing lux bludgeon predecesora weasels\n", + "topic 9 : skeleton cradles unholy ryder indieflick predawn combo realitysnubbing augmented\n", + "topic 10 : bottomfeeder disorienting bedroom topkapi sandal convict shield becks mixer\n" + ] + } + ], + "source": [ + "components = topic_embed.dot(word_embed.T)\n", + "for no, topic in enumerate(components):\n", + " topic_string = ' '.join([reversed_dictionary[i]\n", + " for i in topic.argsort()[: -10 : -1]])\n", + " print('topic %d : %s'%(no + 1, topic_string))" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "from MulticoreTSNE import MulticoreTSNE as TSNE" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "tsne = TSNE(n_jobs=4)\n", + "X = tsne.fit_transform(doc_embed.astype('float64'))" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "sns.set()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.preprocessing import LabelEncoder\n", + "unique_label = np.unique(trainset.target)\n", + "encoded = LabelEncoder().fit_transform(trainset.target)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7EAAAIRCAYAAACcSDoDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3Xt8VPWd+P/XmZlkMpmAZBIGkckFuza2bFuLbLvdFXW/rda2Vu3G8hCCQlCQgEIbW1MzKlIdKl1li1zCPUi52KzTlqqtytddi65ft0v5uVq6prYll4lKIAmQTEIuM+f3x2QmM5lz5pIEMgnv5+Pho2XmzJnPzJw5mff5vD/vt6KqKkIIIYQQQgghxFhgGO0BCCGEEEIIIYQQiZIgVgghhBBCCCHEmCFBrBBCCCGEEEKIMUOCWCGEEEIIIYQQY4YEsUIIIYQQQgghxgwJYoUQQgghhBBCjBkSxAohhBBCCCGEGDMkiBVCCCGEEEIIMWZIECuEEEIIIYQQYsyQIFYIIYQQQgghxJghQawQQgghhBBCiDHDNNoDiMMM/B3wEeAb5bEIIYQQQgghhBhZRmAq8N9AdyIPSPUg9u+AN0Z7EEIIIYQQQgghzqvZwJuJbJjqQexHAG1tXvx+dbTHIs6jnJwsWlo6RnsYQkSRY1OkIjkuRaqSY1OkIjkuU5vBoJCdbYX+2C8RqR7E+gD8flWC2IuAfMYiVcmxKVKRHJciVcmxKVKRHJdjQsLLR6WwkxBCCCGEEEKIMUOCWCGEEEIIIYQQY4YEsUIIIYQQQgghxoxUXxMrhBBCCCGEEBeEz9dHW9tJ+vp6Rnso447JlE529mSMxuGHoBLECiGEEEIIIQTQ1naSjIxMrNZLURRltIczbqiqitd7lra2k+TmTh32/iSdWAghhBBCCCGAvr4erNaJEsCOMEVRsFonjtgMtwSxQgghhBBCCNFPAtjzYyTfVwlihRBCCCGEEOIi98EHtbz22qGI2xYunEd397lRGpE+CWKFEEIIIYQQ4iL3wQd/4j/+IzKI3b17P2ZzxiiNSJ8UdhJCCCGEEEKIITK7a7C6VmNo8uCf5sDrXEV38ZwR2fc118xiyZJlHD78OmfOnGH58hVcf/2XATh27A9s2bIBr9cLwD33LOUf/uEaANzun/Fv//YcWVkT+NKX/pGf/7yGl156jb6+Ph588DucOXOG7u5uPv3pGXz/+5V0dnrZsWMLnZ1eFi6cx1VXfZ7vfOf7XHPNLF599TBvvPE6r7/+7/zoR08B0NfXR3HxzVRV7eSyy6axd+9ufvvbf8fn85Gba6eiwklOTu6IvAdaJIgVQgghhBBCiCEwu2uYUH4/SlcXAEZPIxPK7wcYsUDWarWyY8ce3n33HR599CGuv/7LtLe389RTa/iXf3mG3NxcTp06xeLFd7Fnz884ceJjfvrT3VRX7yc7O5uf/OSp0L6MRiOrVj3BJZdMQlVVnnhiFS+9dJDbbrude+5ZyltvvcETT/w4agzXXfd/eOaZpzl9+jSTJk3i7bffoqCgkMsum8Yrr/yapqYmtm7djcFg4Be/eJ6NG3/CqlVPjMjr1yJBrBBCCCGEEEIMgdW1OhTABildXVhdq0csiP3yl78KwIwZn+HUqZN0d3fzhz/8Dx999CHf+96KgedVFJqaGnnvvXf50pf+kezsbAC+8Y1bOHToNwD4/X4OHNjL22+/hd/vo729nYyM+OnCGRkZzJ59PYcOvcy3v30Hv/nNC3ztazcD8Oabh3n//f9l0aL5QKDXblZW1oi8dj0SxAohhBBCCCHEEBiaPEndPhTp6elAYBYVwOfzoarwiU9cwaZN26O2f++9d3X3dejQy7z77jts3rydzEwre/bsorGxIaFxfO1r32T9+qe48cabeOedozzyyONAoAfsggWLuPnmW5N9aUMmhZ2EEEIIIYQQYgj80xxJ3T5S/vZvP4vH08DRo0dCt/3v/x5DVVWuumomb7/9FqdPnwbg5ZdfDG3T0dHOJZdMIjPTSkdHB4cOvRy6z2oN3Kbnc5+7is5OL1u2bGL27OtDM7jXXHMtv/jF85w9exaAnp4ePvjgTyP6egeTIFYIIYQQQgghhsDrXIVqsUTcploseJ2rzuvzTpw4kSefXMeuXdtYsGAuJSW3s2vXNlRV5YorPsm8eXexdGkpixbNx2g0YrUG0ntvuulmOjs7mTevmIqK7/K5z30+tM+rr/4C586dY8GCufzkJ/+i+bw33fQNXnjhF3zta9+MuO3GG2/i/vuXsGDBHdx993zee+9/zuvrV1RVPa9PMEyFwPGWlg78/pQepximyZMncPJk+2gPQ4gocmyKVCTHpUhVcmyKVJTMcfnxx/VcemlBUvs/n9WJh6qz00tmphWAnTu30tTk4dFHHx/VMYH2+2swKOTkZAFMB+oS2Y+siRVCCCGEEEKIIeounjPqQetgVVUbee+9/6Gvr5fLLpvGgw86R3tII0qCWCGEEEIIIYQYRx54oGK0h3BeyZpYIYQQQgghhBBjhgSxQghxkXjo9gam2i3Y7VlMtVt46PbESuoLIYQQQqQSCWKFEOIi8NDtDew8/Gl8mAAFHyZ2Hv60BLJCCCGEGHMkiBVCiIvArsNXAsqgWxV2Hv4UtpkzyJ1yCbaZMzC7a0ZjeEIIIYQQCZMgVgghLgKq7ule4X7PgyiqitHTyITy+yWQFUIIIURKkyBWCCEuagpVLGcfcwP/6urC6lo9ymMa+9xuEzNnWpkyJYuZM6243dIMQAghxOj65S+f52c/2wfABx/U8tprhyLuX7hwHt3d50ZjaEmTv6pCCHERUFBRo9KJB+69ly2UcAAAg6fxwg1sHLoqv4cPz2URTN/2eBTKyzOAcxQX943q2IQQQly8brvt9tD//+CDP/HWW2/w5S/fELpt9+79ozGsIZGZWCGEuAjoB7ABXiYM/ENRJKU4AYNnWw9WHOF6+3E+PGdj8Prjri4Fl8s8OgMVQghxXp3P7JtrrpnFzp1bWbhwHnPn/jOvv/5a6L63336L0tJ5LFhwBytXluHpvwjd0FDHvfeWsmDBXO68cw779/8UgJ07t7Jx4084c+Y0O3Zs4ciR37Fw4Tx+8pN/CT1XZ2cnr7zyax566Huh5+nr6+PWW2/iww+bANi7dzeLF9/FokUlPPjgd2lpOTVirzdRMhMrhBAXASO+/srE8SmqitW1mu7iOed5VGOX222ivDyDrq6B2dbF1df336t9waCpKfaFBCGEEGOP1t+Dkc6+MRgM7N69n4aGOpYuvZvPfe7zADzxxKNs2LCN6dMv58UXf8nq1Q+zffuz/Pznz3PNNddy552lAJw9ezZif5dcMol77lnKW2+9wRNP/Djq+a677v/wzDNPc/r0aSZNmsTbb79FQUEhl102jVde+TVNTU1s3bobg8HAL37xPBs3/oRVq54YkdeaKJmJFUKIi4AvgdN9Ls2htbGGJs/5HtKY5nKZQz9YBijoBbAA06ap53VMQgghLjytvwcjnX1z8823ApCfX8gnP1nEsWPvcezYH/jEJz7J9OmXA/D1r9/Cn//8Jzo7vVx11ed54YVfsn17Fb///X8zYcKEWLuPkpGRwezZ13Po0MsA/OY3L/C1r90MwJtvHubIkd+xaNF8Fi6cx89/XsPHH384Yq81UTITK4QQ45zZXUM+X6aBwhhbKbQwmSVsB+COaW9ekLGNVcnPqqo4nd3nZSxCCCFGj97fg9HMvrn++i/zt3/7WX73u7fZu3c3L730Kx599PGk9vG1r32T9euf4sYbb+Kdd47yyCOBx6uqyoIFi0KB9WiRmVghhBjHzO4aJpTfzxoqycQbd/tOrMxnH9M7j0lFXQ1ut4miIitqUpOqKpdltEpRJyGEGIf0smxGMvvmpZd+BUBjYwMffFDLjBmfYcaMz/CXv/yJ+vo6AH7zmxe54ooiMjOteDyN2Gw5fP3r36S0dDF//OOxqH1arVY6Ojp0n/Nzn7uKzk4vW7ZsYvbs68nIyADgmmuu5Re/eD6UotzT08MHH/xpxF5rouQXihBCjGMHK9/hka4/0kA+Nlqw0EkLucRKewWFxtYJlJerjNeKum63icpKM21tA++DTWll7cJ3uXXtrKhtXS4zHk9w22Surqt82vQnXm+4bPiDFkIIkXKczu6INbEAFsvIZt/4fD5KS+dx7tw5vv/9SrKzbQA8/PAPWb3aic/nY9Kk7NBs67//+yFeffVl0tJMKIrCypUPRO3z6qu/wIEDe1mwYC6f//xMvvOd70dtc9NN32DHji1s2rQj4rYzZ05z//1LAPD7/XzrW9/miis+OWKvNxGKmtzl5AutEDje0tKB35/S4xTDNHnyBE6ebB/tYQgRZSwfm263iQfK/HRiDd2WiZcFVPMspRG363E4/Bw9Gn8Gdyxxu02sWJFBb290MJrGOTaX/r9QIDu4YEdy/Oy6dic3P3/HMEccbSwfl2J8k2NTpKJkjsuPP67n0ksLktp/8GJnU5PCtGmBAHakLgBfc80sXn31MJmZmSOyv9Gm9f4aDAo5OVkA04G6RPYj6cRCCDFOuVzmqEC1Eyu/5ma2sZgcTgKxLxCOx4q6LpdZM4AF6CWDx/dcEbFt8gGsioKPqqru8xLACiGESC3FxX0cPerlxIkOjh71jssMplQjQawQQoxTegFoA/nMS3ueU9jZSwkF1KEXzI7HirrxAvNG30Dq71AKOF15aSsnmjvlR4wQQohhe/PNI+NmFnYkSRArhBDjlF4A6rB5aX+mCp8jj3kc4Ljxb9hLCZlKZ8R2I72mJ1XEC8zzjIFWAWZ3DTZak9y7wvsf26QolhBCCHEeSRArhBDjlNPZjcUSGbBZLCqVLhPdxXNoPXqMU81nOfVRGzc2b+XpzQoOhx9FUXE4/KxbNz6LOjmd3aSlaQeyaZzjkbs+YOLtt/CDsi5a1OwYe9ILhke2P6AQQogLK8VrBo1ZI/m+yqViIYQYpwIB6LmEi00UF/eNy6B1sOD7oledeB77ef7wVLawjNjXevVTjcfjWmIhhLgYmEzpeL1nsVonoihyLh8pqqri9Z7FZEofkf1JECuEEOPYxRKYJkv7fUkHZmGZegNO/ow6jGSl8biWWAghLgbZ2ZNpaztJR8fp0R7KuGMypZOdPXlk9jUiexFCCCHGC5+PBvKHsYPxuZZYCCEuBkajidzcqaM9DBGHrIkVQggxbG63iZkzrUyZksXMmdaxXdjIaCSfhvib0Uv0uliVa6/1yey3EEIIcR5JECuEEGJY3G4T5eUZeDwGVFXB4zGwbFkGFRVjs7hR112luKgkE++ge1SsnEXBTz517Dbezd3X/hGjUQVUjEaV0tJenn++azSGLYQQQlw0xvClciGEEKnA5TLT1RVZ/EJVFXbvTuMLXxh7s5Letev4FuWw+14q1SdoJB+HtZXKp6wUFyuAF8gBNvFV4Ed0jO6AhRBCiIuMkuIlpAuB4y0tHfj9KT1OMUyTJ0/g5Mn20R6GEFHk2IxvypQsVFWvgqNKvrGJNb4HmceB0K19RZ/i9Bv/dWEGOA7JcSlSlRybIhXJcZnaDAaFnJwsgOlAXUKPOZ8DEkIIkTizuwbbzBnkTrkE28wZmN01oz2khMSuxKvQ4HOwhO3sZy4KgcY0ptr/ZdLsL16gEQohhBBiPJEgVgghUoDZXcML9/2WKZ7fY1B9mDz1TC+7hV/PXj/aQ4vL6exGUWJny3RiZT77MNHLMjaEAlkhhBBCiGRJECuEECngV9/7Lxb4dtLCZOifr2xlMotrv8eLtz832sOLqbi4j4ULe+MGsqDgw0QVy1nGhgsyNiGEEEKMPxLECiFECljpdeEjLer2XjJYergEs7smpdON167tZvPmczgcfqLbzgymsJWyCzEsIYQQQoxDEsQKIUQKaCVX974eLHypbBYTyu7B6GlEUVWMnkYmLFuMtaL8gowv2AfWbs9i6tQs7PbofrDFxX0cc1azl/mk0x1zf/7+Pz+pFIgLIYQQYmyQIFYIIVKewh/5LPuZG3mrqmLZvfO8B4LhfWBBwecLpDt7PAbKyzMiAlmrazUl7GcCZ+PuV+nfXgghhBAiGRLECiFECrBZz8XZQmEl0UWeFFU974GgVh/YoK4uBZfLHPq3ockDQCs5Ce07uL0QQgghRKIkiBVCiBTgegpM9MXcpoVczYJI5zsQbGrS6wEbfb9/mgOATDpiPsZKO/uYS6GhAbs9i0svDaQo2+1ZFBVFpikPldldw8tFlcyytzDFbuXqK4nYbyqvMRZCCCGEPglihRBilJndNSx2FTGRtjhbKmxhGfsGpRUHA0cYWLs6ZUr0mtWhit0HNvJ+r3MVywxVeJkQ8zEKsITtNPgcgILfH+wgq9DWZmDFioxhjd3sruGFFW+wtO1JGihExUBj6wTKV5pwu02Y3TVMWLksco1x2T0XbI2xEEIIIYZOglghhBgFByuOMGvqGex2K7ayBZg8dTGLOwWpGHCyZuDfFgte5yogcu2qqg6sWT1YcWRYM45OZzcWi3Yga7GoOJ0DRZy6i+ewjSUEAlJ9HUygE6vu/b29kWnKybK6VuPsfSzqObp60nC5zGQ5H0Tp6QnMBnMcAz6mc5xfVHtlRlYIIYRIcRLECiHEBXaw4ggrq7/QPwtpwIeJwOk4duAX1EA+KuBz5NG+bgPdxXMA7bWrXV0Kj1dfHjXjmGufSO7U7IRmHouL+1i3bqB9jtGoAioOh591685RXByZBu3zJ/Y64omXxhyLoclDA/m6+1VaW9nHXJawnfr+mdp6ClnCdg5WvjPk5xVCCCHE+SdBrBBCXGCP77ki5ixkPA6bFzXbhqE/IM25shCzu0Y36GskL+LfocRdnw9L9Y6EA9mjR700N3fw0UcdNDd3cPSoNyqABTAaY+8rk84EClnFT2OOxT/NQT4NMffrZE3U59CJlYfbHhjy8wohhBDi/JMgVgghLrBG32VDfqwlvZcfnV6Goa01FIwaWluZsHIZjmztYkp6wRz9j7fsqR7yeLTcdVcvoBWAquSYTvN0lYLrKVAU/SA1LS0yTTlZXucqXGmPkYk34nZLei9OZzdqtk13prZR53YhhBBCpAYJYoUQ4gLLM36Y4JYqOZwkh5Mo+Mmnjq3Wckr8e6O2VHp6cKmVUWtX0+mmAysGfBRyPKooFAA+3xBehb5/4D8xRFVa9nP3tX/kfz80UlzcR3FxHwsX9moEsipWq59nnolOU05Gd/EcvjU/jW0sIZ+6wPunNPBMyZsUF/fRsebH5NGo+ViHzat5uxBCCCFSgwSxQghxgT1y1wdRM4SDZeJlLyWsZyVZwW2NJgxtrbqPmX96U2jtqqKo2Gx+VKCFyaE1n3eyF0UjoB2pFjNmdw3fqb4aP2mD7jHwyl8/FXHL2rXdbN48MF6Hw09V1TmOH9dOU052HJZ9eyhhP/VMx4+RerWAu/d+FbO7hu7iOTxS+lcy6Yx4nCW9l0qXdlXkYOVnuz2LKfbMUEug8P8+ZT/Hi7c/N6yxCyGEECI2CWKFECPO7K7hlcIHQv05Z116moMVR0Z7WCnj1rWzWF/6O/KNHsCPkT5AxWjwA34KqGMbiwFYRHWo8FCDz8EidmnPphJYBxpcu3riRAeZmdBLZIVfFQOEFTHax1z2M5dPeA4zqaw0qpdqsioeMOq219Fasxs+Xr01tkNhda1G6emJul3p7cXqWg0EPoenq5SIIHrd+r6IMbjdJmblebHbsygrC1R+BgUVI2Gri0P/tTCZpYfvkkBWCCGEOI8UVR164YwLoBA43tLSgd+f0uMUwzR58gROnmwf7WGIEWB21/DC8tdZ4t8SUTRHwc/iSw/yxLtfGcXRJW8ox6bZXYPVtRpDkwf/NAde56pQBeF4rBXlWKp3oAC5NNPC5KhtcjjFqUG3q+nptK/fHPE8U6ZkoaqxK/zmcJIuMiM+K4tF1aw6nIgp9sz+AC+aw+Hn6NELk6qbO+USFJ2/b6qicOrEGc37DlYc4fE9V9Dou4xs2uggix6Sb/WTTz1Hmm1JPy5Rcs4UqUqOTZGK5LhMbQaDQk5OFsB0oC6hx5zPAQkhLj5W12qc/sejqr6qGNj28W1Mn2rC7TZhdtfo9i4Npm1OmZLFzJnWYc0Mnk9a4zS7a5hQfn9kS5vy+xNO1fWuXUd71Q58jjxadPrGtpCDPy2dQKMb8NtsoQDW7K4hp6iQXPtE8tT6uM/XQm50L9WuxHu0hn+OLxdV9s/0akm8UFNFhZmpUwPpuVOnZlFREXssWseSf5pDd/vw+8zuGl4uqmSWvQW73cri6utp8DlQMdBKzpACWIiuCC2EEEKIkSMzsSIlyBWy8SN3yiUY1b4YwQwMVK5VKaABF5XMs/ySrjtKcP8yg6VtT47YzOBw6R2bbreJ8vKMiL6sFovK1owV3Nm2MWp7nyOP1qPHknpuuz0L7d6xKnsp4VulVrxr14VuDc6CO/2P00A+NlpoZ+KQAjFFUTlxQrvacfjzTVhRhtLbC0AWZ/AyUWdrFUUJtLdxOrt1P8uKCjPV1WlEvm6V0tJe1q6NDoKtFeVYdu+MmHVVFYWe2deR/vZbUSnF+wzzeWjSZjytWeRRzzd4kWcpHVbLIy0yEysuVnJsilQkx2Vqk5lYIcSoi9Wfc0CoOQz1FDKffdzQ9Qss1Tt4pO2BYc0MXigulzkigIXAOMN7jO5jLoUcx4CPyz2Hk55Rzs7Wu3insJL1Ua1xDla+wxL/ltAa2kBBp0CFY/DHbGkzWCI9WrMqHwwFsF/hZd21sIGLFgqqquDxGChfadJ9L/bsGRzAAij9t0cKpV6HBbD7mMt09a9YDr9GQVoTe62LQzPWezMXs9i0i8bWCagoNFDIFpaNeACbTjePXfvqiO5TCCGEEAMkiBVCjCivcxUuwyMo+JN4lMJr3MhyNuj27tQqCjQagqmrTR7t+4M9RpexgTvZO1CUiULKyzOSCmTXrOlGu99qIA14ue8nESm0D2tcAOglgyy8+DHy7KQV2Dipu88gRVG54YY+rr6SQGEuewsvF1VGpUQr/ZWS9zGX17gR7Vljom7v6kljjVN7Jlav24/PR0Tq9sGKI4EZ2LBt9jGXJWwPveeN3lyW+LeytaqTU81neci2ha6eyGA4dsZAotTQfzmcZMu1e7j5+TtGYL9CCCGE0CLpxCIlSJrH+GJ211CxrIct6lL0AxstKkZ8+IgO9C5kUaBw4cdmcL2r0tVFIceppzBq+wLqcFHJnezVDJCSfR1FRVba2rQDLQU/P2U+JRxAtVgwdnVoPqeCH59iYr96h+64IqlY0vsiAr5MvGxjMXNtr7Dr1udZfeg6mjyQTwMdWDULUAX3pXUMKPjZXNWNy2WmqUnBkd2BS61kYdtP8GsWhorcT6bSyTb1Hko4ELpN7zMJvueJFLqK5ieHFlrIwYCKf9B7Z1BUFizs00x1Pp/knClSlRybIhXJcZnahpJOLEGsSAlychmfnLP/m+21/0RygSwMDlhGe03s2S07yXI+iNLaGhpVcNYvfOYzGOg5WaMZTEGgOcuJ5thrTcO53SbKyjLQew+N9OHHQD4NtGOlVSOYzKeONVSyiOqE1scajSo+X/TzGeljCVs01pBqB6qx7wue08MCU7x0Y8KX4BreAuqoY3ro3wZ82kF8//remTOt/S1yBt2PXzf4v9e4nc3+sqSrTJ9vcs4UqUqOTZGK5LhMbbImVgiRUlxv/B1VVeewmnuIl8IaKbKozxfyPVG9O/WqF49kZWO320RhbgeTykq5vPX37A/rz1rCAbaxmALqUMJ6u5ZwQDclGiDP2JTUGIqL+7DZ9N87HybU/rXFreRiILKQUSadrKESJ2sSCmAz8eqn9GKiiuUaa0j1A9gvK6+RidbMc3Bd9IBOrPhIjzvGoMHvs95a7OD6XqezG4sl8r3MxMtSNlNAHQM9e/3kGz1sK/0tP/xoHqdOnKH16LGUCWCFEEKIi50EsUKI86q4uI/jjd1sL30dM10kF8wCKByunRZan/kp+zlWlBnweAyhQkH3LTczxZ6J3Z5FWVlGxH0rywya6znjcbtNlK80Ud+SFQoSl7CdfYMC2Tqm02ebzHHLp0OprXrBlIKfNb6KJF8/uFzdCRZlUvCThpWzKP2B2NNVCqo1i3oK4jxWpYA6vsSbcZ8jMSpf5lV+vrmebWnL+4PDRCQ+a5836H3+Oi8SfXwF1vdC4Fhct+4cDocfBZX8/gsPm7ifv1o/w5mqaj5q7qK52cuRjy7h1rWzEh6LEEIIIS6cEU8nLioqWgU8Bnymtrb2D0VFRX8PbAUsBKaH59fW1jYnuLtCJJ34oiBpHuOX220KrXucNk3F44megYsvVrpqfDmc5KSlgPZ1GxKeTbv6Smhsja62m8NJTmEfGJnFQvu6DUCgR66hycPeSctZ2raWTjJD2yn4WcpmNjh+nHSrHQi0ntm9Oy3BNZ1+9jKfeRxgWWY1WzrvIvY1y8D2/8k/UMVyhvNeh++zz1FI69FjmN01XFJWGmcMyQmkbi+hhP2h2+KtiR1P5JwpUpUcmyIVyXGZ2kY9nbioqGgm8PdAff+/DcBeYHltbe0ngcPAkyP5nEKI1BXspRo+M6oMKT4aXlDVQi5KVxdW1+qEH+Np1W670kJuaDZWNRpDgXF38Rxajx7j1Ikz3FS7hvWl/0W+0hBKNf4p89lkeRCvc9WQXsPatd1s3hyYRYw/m23AyRqWs4EtnQuIF8CWsbk/PTrZQlz68mmMeK1K0jPw+hR8bGMx89iPmj6Qflyf4pWthRBCCDEyRiyILSoqMgObgLKwm68GztXW1gbz07YAsqhIiIuEVi9VVVVGNKBJ1D7mYgjrixNslRPeoib8vsGpqgMUnKxBTUujfeNW3ZndW9fO4r3Nh+h1FHJcuZw7HG8mNROspbi4j6NHvTgc8d+/evLjzKqq2DjJXuazifsB8GlWBY5NwU+60Rd1WwP5zHCV4nabOFj5zgi1sgkI7ms5G0jr8aLg72/ppP1aE+l5K4QQQoixYyRnYn8I7K2tra0Luy2f/llZgNra2lOAoaioyDaCzyuESDHB4kqB1OFoKpBna0fBj42T5BhaUVATXPebx3mQAAAgAElEQVQ5FApL2M5+9Q5sM2fw4u3P8Zmyr5DmqWO6+lee81zDhPvuxeyuCbXRWUMlejOe9RTQ/kxV3IA0fHa29egx9jNvRIpOBdZ4xn6vjDGCuqBnWEkJB0JbGdGp6KQr0CZn012HyVcaoD+YDASZgZn3FSsyWNC2Pu5YosV6fQpL2UIVy/vbMSloFYqCQGVip/PCtr8RQgghxPk1Imtii4qKvgQ8AXyltrZWLSoqqgNuBoqARbW1td8I27YTcNTW1rYmsOtC4PiwByiEuGD27YMlS6CzU3+bggKoq9N+bGkp9PYO3GYwgMngo6dvYJYwnW4m0E4rOVgz+ug4Fwxk4lHJ4RRnmUAvGaFbg61xSqy/gtxcqA9cezPQh6oxO2mkjz41uQBU633JzIRt26CkJKldkZsLLS2xtvD3/2/s65Sh191fkGoZG3Rmb/Xb5KiqAoWFUF+vuyZVX+DzaCUHG4EX1EoO+TT0pwbHGn/i66RVNfD+O53Q0AD5+eByJf++CyGEEOK8urB9YouKin4ArIBQbwcHcAJ4Briztrb2b/u3ywXqamtrsxLcdSFS2OmiIAvuxw+9XpxB8Xq+Di4EFZxFW+Psw9NqJY8Gnsh+mlvXXBWaCX3o9gZ2Hy7ChxEDfvz9M4HJCPZbzaOBNVT2z1D60A6kVByOwNgS6V17sOIIS6tn988aRkq26FC8vrGgUsYmfs3NCQWUg3utLmMDWynrfw/BSjteJug8n0pzcwe5Uy5hv3oH89kXY1xaAgWlnKyhgXzyacDV/97HD4gTC2IdDj9OZzcrV2bQ0zOwfXq6yvr1o9N7eCTIOVOkKjk2RSqS4zK1DaWw04hXJwYIm4n9I/ABsKC2tvbNoqKih4FP1NbWlia4q0IkiL0oyMll/JgyJUungm5igZ9WEJtsoDFr6hkafI4kRx451hxOAdDCZN2t4gXkByuOUFH9WVqxoR9w+cmhhRZyAbApraxd+K5me5dgoazB64zDx72XEko4wD7msojquL1hFfz4+2ebVaMR/H780xx4navoLp6DtaKcCdVbQtuEMyh+Pj7h5ZXCB7i38181+sfGo5JGt+asOMAStmvuMxMvXWRozpKHC34+TqeZ1tboixE2m5/33x+bVYvlnClSlRybIhXJcZnaRr068WC1tbV+4E6gqqio6APgOuAH5/M5hRCjS6+IjsOhcvSoN24AO7iacXl5RtJrRx+56wMyGU5wotDCZM4ygXT011N2dSm4XNpB4sGKIyyr/hKt5BB7xlDpD5QDazpb1RyWVX+JgxVHorbUKpQVLodTzOtPDS7hALsoZSC1WFuwp61qsdC+cWto/W5wlrvvC3/PEuMOtPqvLlgY+Cyd5x4ZQgALoEQEsACdWFnAHgC2sZgC6gB/f5/ZQKXnbSxmKVUaYxoYm8Gg8nd/58PlMtPaqv2e6d0uLrxvzz6N3Z4V+u/bs0+P9pCEEEKksPMyEzuCCpGZ2IuCXCEbP7RmCxOZsXz82b+hwe9AK+ALptya3TWhXqzhs4W6+9xzBQ2+aZr7TFQOJ8nCSz0FOvtRURRwZHfgUiuZf3oT/mkO7J7/rz+A1TdQBClavtHDkY8uibhNf5Y7sE54R/b3KL7tHOZDr4Teo+obnmXZ3uvp7Y1+XHDWc67tFTpcP456L4NFrpSuLpaxgW0sxYcRo0HlH6/x8957BtragvvVfm9M9NFHWsz3Qcvg9bpawsc0IHwc8VKOA+nQY9F4Omd+e/Zpfls7+LuvcuWlrRx+N13vYSJFjadjU4wfclymtpRJJx5BhUgQe1GQk8v4kkxK8MGKI6ys/kLMmTwFPx2l92J5bh9KV1fodtViYecdL7H60HVRa2hdLnN/dWSVwUknJnrpI/FiUHspwcmauGtMwwOvWC1fILAG14chamxBCn5ONEfOJuutNzYaVTZujHORYPcnaFAdGPHjwxBYf2pdwy1PfVH3QoBt5gyMnsbQv/cxt/99KAiNMpYC6nBRyd3soBtL3O21Hn/c+Dfg96NmZqJ4vVEhalfpPXjXrou7FltLdraf2lpJJx5tdnsWehdBtpe+rplaL1LXeDo2xfghx2VqkyBWjFlycrl4JbJ+tYA6jjM94mfuPuaykvX9a0kH7klLC8yMhhfxGcyWfpbTPVbNdZ5aMvGygGqepTRu2mywUFKsIDYTL1szv8vDnZW6gbHWTOxQZrkhOhgN8jnyaD16TPMxbreJH5WdoLG/4NLXeTGh1x8UHtDrF2mKXhMbyY9fMXHqxBkArBXlWPZUg88HRiNdd5XiXbsOiD1LrSUtTeWZZ6SwUyrQD2LBylk+rPq3YfVXFhfWeDo2xfghx2VqS7k1sUIIEU+j77KY92fixUVlVAC7hO1ha0kH9PYqMQNYgNaeCXFWikbqxMqvuTm0RjMQoGpfWGsgHyBUGGowAz7Wl/6OW57+Iq60xzTX3KZxjkfu+iDq9uLiPtatO4fD4UdRVBwOf0QAG+zPO7gPraHJoz2W/tsPVhxh1qWnsdutTLVbsNutLCvLoIFCVAzUU0gVyxMMYNXQutVgKnDwPdGyM/v7GHT60yrA3knLQ//2rl3HqY/aONV8llMftYUCWIDszHMJjC0wPofDP6YD2IuJlwkcrHxntIchhBAixUgQK4QYVXnGD3XuiQ6GgpysGWIhoQCjQYUEZ2GD6smnhAPUMR0/RvKp19wuWChpPSujAtR0Qy+7s7/D3bu/jNW1mm/NT2NH9vfI4SSBoFjFprSwufT/6aZQFhf3cfSolxMnOiIKZbndJspXmiKKYj1QpnKw4gj+adoz3f5pjlA6d4M/j0CIbQIMqFEzY4nNciqouKhkHgcCr0hRyOt/T6Lfq3r+47anQ+18BlMx4FTWJPa8YWnm+lSqqs7FLTAmUonCw20PYHbXYJs5g9wpl2CbOQOzu2a0ByaEEGIUSRArhBhVWpWEM/GylxLqmB4RwKpKIJCKNbMXTyZefMlMw4Yo5NLMPuYCsIZKzXG7qAQGqgMHZ27zlQZ2+RdwZ9tGFFXF6GnEsqcapbsbK14UVPJsHbg2Zw5pDeDD93fT1RNZQKmTTB6vvpw9lz9CgVKPAR+5NJPFGRT8mDz1LKm+blgXBAZTMXAnP8WAHwN+jGofDRQweOY6nW6+nvZ/qa5OI1aA7GlLrK14q39S3G1sVpl9TUUOR+zlQg0UMKH8foyextB3Z0L5/RLICiHERUyCWCHEqLp17SzWl/6OfENjINjTmX1V09PpWng3Pkee7syePpVge5YFVGNMKpk4KNAKZz57UfAzn30Y8JHDSRT85Nna2ZL9g1CLGyBi5rZeLYh6Tft9c7i3819DabuNrRMoX2lKuqXQi7c/R0vfJZr3NZDHfW+U0KDmo2Kghcl4mUiwpY9edeThCPRvDd+/wuBAtQ+Frb2Lom6P2pcKsy49rdlyKJz+jH5AutGH66m4QxejIFCMLXYgO3imXenqwupafR5HJYQQIpVJYSeREmTBvQhndteQVfkgSlsrAKrNFtECJpGKxpH8qBhDa2lHcuYxnW52sYgbm7cAkDvlEpQEz6t6BY+CLYUSNcve2j/bGc2AL+ECVqktkG6dTwNPZD/NrWuuiij2o31MBD4Hm03F5dKvkD0WjbdzZqziTqBqXmxRFSVU9EukjvF2bIrxQY7L1CaFnYQQ40J38RxaausCBXyaz9Lyfl1EwFJ6aAELqCbe7M0AJdQeRjuAVcP+S04PZhbwLFOmZHH19G72q3ck/Fi9tOimpuRa0TSSp3uf3nrT1Kb1OSiAgQYKWdr2JC+seCMinTQ0o2/0BGb0jR62l75Oc3MH778va2DHuvBU/iC9td5CCCHGv7H460YIcZF7zjObHdxD4n1HFeazL6y/6WAqmXQmsb9IPkyoqkKjN5clbI/6sR20j7kUchwDPgo5jo0Wze2mTUsumM4jun3OgKG9puSpulWGR1onVpy9j5HlfJCXiyqZZW9hit3KDw/O4qGNOZxo9nLko0ukv+i4EUjlv5M9oe+WarHgda7S3NrsruGV6Q+EjotZ9hZemf6ArKEVQohxRIJYIcSYU2lcG6O3qJ7odZlBRqMyYinGnVhxEl1RN5jKXB/WtqadiaQR2RrGkt7bv0YwcY9d+2pUkalIyQTFic5ID8xe2zjJrmt3sDv7O1GvJ3mJjbWBfA60fpWlbU8Oe02xGHlmdw2PFrq51J6J3Z6F3Z7FdLvKi7c/p7m9TaclVTgVE4vYgc+RR/u6DZq9Y83uGl6477fc7d0QOi4aKORu7wZeWPYfEsgKIcQ4IUGsEGLMafRNG7F9WSwqvhGeQAxPEw6GZFqpzD2YmWjoJL+/gnGerZ116/uSTn29+fk72Jp+n+5MqNWqhnrL2qxdWGknGIBaOUuOoTVUVCvQ3zb27K2Cj72U4MeAz5ZLY2kld/31ceaf3sSuzBXYwloGDfTUTTQ41r/YEC6fBs33tKsnDZfLHHFbsH+u3Z7F1KmBgCq8j64YWWZ3DT9Y3suWzgX967EDn6mXiSw+vJBXHCujgsk5mS+RyPHRg4XtztqIADa8P/Jn7vsqS30b6cE86HFmVqr/SpbzwRF4hUIIIUabBLFCiDFnWsyWHIkESyqKEgjs1q07F7fFh/7zaMsPr56sKPgcebrrX1vVbI40B1Jgf/8+mgFs+I90veCrpPdZsmnVfI70dDh61Mvpzbs45c+hg4moGNhLCRn00OLPRkXBi5UWcuK87kDi8DzlOU41n6XD9WMsz+0LtT+Z37mdU2nT8FknBIJcjKgYRrQKcrCVUSJrit1uEytWZODxBKok+3yBgMrjMVBWlsGll0pQO9KsrtVs82un+/tI4+GeR6Na5LyUUay5fTQl4iKF222ivDwj1B+5weeggwmaj2whF6VV+zsihBBibJEgVggx5jid3aShlXKrUsYm9lJCAXXoBZpGI5w40cHRo4GCP6tu+G1UOm463RCzFY9+JdV68inkOPuYS9fCu2k9egy9GjRa61/Dg9aiIiv33TfwI93jMbBiRUZUwOWf5qBVJwA9fTowVqtrdahVyT7msohqWsglOFPWwuQYrytSsKjO4H0Wchxj7zmmn3ufrVWdtFftQLVYADAOe82sipE+OrFQyZqE1hRXVprp7dV7TQp+/0BQ+8AyNW4rHxGfocmDL0ZF7Abyo1rkJNoPGCIvUrhcZrq6Bn++F2oduBBCiNEiQawQYswpLu6jmkXkhKWt5nCSvZSwmftD/Vn1glifLzJQfHzPFSygmoL+tN4C6thFKUvTd/WnwyYjUEW3nkIWG6vZ84VngEDgbbFEjsdiUaPWvw6eWWprM/TPHg7o7VWorIxMl/Q6V5GneDRHFAzqDE0D9ztZE5VyOTD+2DPTOZwKFdUJ7nMZG5jP3tCa3wafgxX3pbGfebSv24BqNLKELRr7Dnx+RuKnUCuo+DARrFLcQg5GeiO2GbymuK0t8YCmU83k8erLdddNJjIjLgIXOGJdsAhmKgSPHbO7BpvSlvD+s7PV0Ofg8ST++eZwCjXblvD2Yvwyu2uwzZxBrn0iuVOzybVPxDZzhqyZFmIMkSBWCDEm3eF4g1PYQ6mqp7BTwoGIbQrC03rD2GxqVAris5TiohI/RuqYTgkH2Ny7hJ+WvcVQWu8AdPnModTH4uK+/tRlf0Qq8+D0Ye2ZpWiDg7Pu4jk8svAvZCqdEbeHB8rhLUn0UnGDAhcIgutZB6TTzRxqmF52C3Z7FgbVRxZnqGI5g/+k9PpMOL8XGFv7xq1ssjxIGZv6A9ZA4FrGJlQMPMuCmMWpFPwaKckGfJiwcXJYa4rDNZDPpLJSZk09w6v2peQUFfLK9AfIsxspK4ucEV9eliaBrAavcxVLDDvQ+t4Y6cVFJRA4Hg9WHGF62a20+LMT2nd6ukp7uxL6HPRnXSMvPqVxjvV8h441P07ilYhU53abuPpKQlWoX7UvJdc+kVft9zLr0tNM6V8qUFFhDl34uPpK+MHyXj7hOYwRH9N9f2Y/czF6GqPS3IUQqUtR1aH9OLtACoHjLS0d+P0pPU4xTNKE+uLmdpt4+Dt9tHQHUgptnGJ9egW3rP9HzQqkELiSPqH8/lAqKwR+Mgd/0u5jLveyBS8TiPyhGzyXRP/4LaCufwY3wOfIw9jYwKJFPVRXp2k+Jh5FUTm9eRdW12oMTR780xx4nat0X9eUKVn9P87jUWlu7oi61e024XKZaWpSmDYtEMAGg7rw96yQ49RTqLv3TLxsU+5FBZzqEzSSTx4NfIMX2cE9SVSHHhin2V3DhLJ7NN9FVVHYufA1Ht9zBY2+y7AZTqMqBtp8E7HREpb2HM3h8HP0qHYAfOV0E61eS4JjjZSJlwVUs4178ZGmuY2VDo43X/i/T6l+zjS7a6h4wMi2zjtDvYqttLOVpZRwANViYecdL7Fy9xfpVDMT3KuKzXou7udpsajMz/8PXqn9BI3kkUcDrvTV3LL+Gt3vnRg5F+rYdLtNlK800dUz8N0MfmefpXRQ0bfwvw7R/87EyzYWA3Av2/D2P9ZggAULelm7NrmK8SL1pPo582JnMCjk5GQBTAfqEnmMBLEiJcjJ5eLldptYudxEjz8ySEjjHNXKPXxz8z/FDGTDg8PuG76K+dArPOeZzSJ26aTLxuKngAYa+gO2x649ROlvF3PyZDsVFWb27EnD5wv8sFEUItJ8tWcKIc/WTn3XlMhg22LRbREyc6a1vwhRbDabn/ffj9VWR1vwPXvOM5v57CVWYJ5na+f374c958wZfMJzOGbwGy0y2LbNnIHRE93XVjUawe+PCvJftd/LErbHbIGkKConTkQH9GZ3DS+seINFvVuHcCwEGOnrT2HWo30x4XxL1XPm4O9k8LPUun2GqzShYz0oX2mgQXWgnUQWSEvPN37II3d9ID2CR9GFOjb1zpUGfP1VsZOTw0lOM0njgpVKaakEsmNdqp4zRcBQglhJJxZCjCqXyxwVwAL0koFTfSKi+Mtg3cVz2O6sJX9aH2lN9RQd2sp2Zy0/cPx0SEGLAgNrOinkvsPz2PeVXQCsXdvNRx910Nzcwccfd7BxY39qMCr5Rg9LqYpO5U3v5UdtZREBLBBV1Cac1trZqJTedBWXa2g/qNwH/Jg9H/QHsNH7Dje42I6hyRM3DXmwHEPkWkevc1Wo0FPQPuYy3fdnjGofn/Ac5oUVb4RS+iqNP47bw1erOBYEik6V9D7LLkrD1k8nJ1aBIhEQWitst3J52S1M8fx+4LNc/jpmdw3dxXNoPXqMUyfO0Hr0GN3FcyIKNMWTiRdX5hMYddeoK2TSxRrfg1j2PsvVV5JUSyVZ7xwt2dZUZncNOVcWgqKQa59ITlHheU3N1Tt+/EP8adtCrk7GhcKePdqZGEKI0SNBrBBiVMX6IVtPPpd7Duv+sBxcBMnjMfT/eyjVSdWomdROrKx87Zv9a66yQuskbTNnMI/9HD3q5URzB0c+uoTVzXfx9GYltOY1z9bOdnUxJeo+zWcLL7IUTmvtbGlpb8S/16+PXkubiBdvf45Fhxfhw8xAP1b9Qk6Dg0P/NEdk+6A4TPTy5IL/ibitu3gO7es24HPksY955NLMfPaFLh7UU8iS3k0crHwHiN8TWKs4VlDwPS7hAFl4GUo6eLyKyoakC3+NLw/d3sCyMnPgO4iBFibTwuSBz9K/hV898F+aj9W7+DCYgo9tLKakcwe+GD9bOrGykvUs6d1EY2tgGUF4S6Xy8uiq3qB/HtHaNlQQaMolmoWAxkswHP6eaLWmstuz+NSlvaFq3mZ3DRNWLsPQ38JIAQxtrUwou+e8FE1yu00YdA+FoVan1n/cSPcSF0IMn6QTi5QgaR4Xr1jps4NTdC0WNaIYkt5j46eAhlPJ4VSMNZfaa6fmWX6pmxIM+mmzQT5HHq1HjyU4xpExy95KAwU690a+zsHvNfSn5973W0p922KsiQ2cq3MMbTy54H900zq11rOFy6eOI805MY4PFYcjcs3vYOGfgQFf0r1q46+JVbl34gEe//M3k9rvSEiFc+bBiiMsqb4u7vsa/CwHCwZKkYXMBq9dDBQT25lexlz7/00gnT368eEcDj/HnNURqc3TO4/1B73R24avtdZchx+2NEDr9Wh9j8aCRJc1pNPNptK3MP/y5zzS9gAN5JNPAy4qowrtqYCabaNjzY+HtTY52PtZv3VW8NlGstVS4LyWTz1PZD/NrWuuort4DtaKcix7qgNRrtFI112leNeuG8HnFSMlFc6ZQp+siRVjlpxcLi7h6+P2TlrOPaefpkdNH7SVH61kkfAflvpFkPSLN2lvl8i2A4IFoGIForlTLkHROb/GWhN7Pk2xW2MEHCoF1AfWAxs/5KGNOZo/vM3uGn71vf9iqfepqKJZmXjZkv0DbqpdE3csV1+JZuAQpODnRLN3WIFBMoWsAlSs5l46u9PIMzaxxlfB3OyX2d9THPZ6B7a9d+JzoxLAQmqcM2dNPUODT6cBcpjgZ6klvBCZI7uDtlaVDiZGbZdPHY9de4jvHf5Wfz9k/T7Nsb7LCio+izUiENW7wDF4rbXehangeUAv8Ms3enhv4yvD/r7rrTc+HxIvMAc2WjjLRPrCLvSY6GU3C6ICWRj++a+oyEpbW+wAO4eTdGOmI6qw3/Bl4mVb2nL6Lv8bHq2dT2NY4D6PA3SV3iOBbApKhXOm0CdrYoUQKS8YWBg9jSiqyp1tG9llWkKWoZNgcRYFH3o/PMLTj/XTEWO13tDaLtYP4mj1/etC9VKCIbKdTcQejcZRCWAB8tCfGQYCgTlG3tv4im6A2F08h68ef5oPq/6NvWmlEb11t6Utp/u2f9ZNpwxPxfS0xl7n6rAFgp5EWxPpjTWYuuzCSSaRa5bTOEdOf3uefEMj20tf53hjdyhF/MbmLbTU1vHV409zvFmhubkj7D/vqAWwqaLRd1lC2+UZmnTvKy7u4+hRL5s3n8OLtT/oiNZAPve9UUJrRMZE5PczEy85nIo9FmNT1Bp1vRT5wecXve978Ha9pRGNvsuYsGwx1orymGOL5WDFET6z7AZMnjrS1B5Mnjo+s+yGUDqvnnjpz4MF06GTmd9oxRYRwAL0kcbd7NDcPlZNgETE6/2ciZc5/Ky/uJNWZfrh6cTKyt5/YVntd2kIXwbBdvYzNzAzK4Q47ySIFUKMqIMVR5g19Uygb9/UM1E/sqyu1VE/IuntIZBsEQgoVYwoOr9Twn9YahdBOv+CxWX0AlXQLmCkWiy0b9w6am0+Hrv2VdBc46nyad4N/L9sW0Lj6y6ewzefmc1fHNfiU0z8xXEtXfMXsOK56yPWFj6wTOVgxZGoixc2WnT3bUnvpdI1EPwGA50TJzo4etSbVGpmsKDQjc1beLpKIc/WHghaqWNn9vf5a9WvONHs5cjHk6SibZLyjB/G3SYTL48s+HPMbdxuEytXZtDaGlh/qcWIX6MVj4LRqIaKq21jMetZqdtv2GJRWeOrYB9zKeQ4BnwUcpyv82LUYzLx8qRnPrn2iUy8/ZZA8GcwRD12H3NRJ2VjmzmDPLVe83nzaUBRVSy7dya0LjR8XW1RkZXp060srr6eBjWfYG9kMNCg5rNy9xdjFloK/87F64M6eB1s4rS37cbCMjZo3hfrAuDQBS6CdmNiK2UaBeFGbka2hdyo/XdixckaWUArxAUiQawQYsQ8dHsDS6qvo8HnCFT49TlYXH090/PMoR9aWj9enKyJ+oGqqgqKEhmgahXxycgI/HAZqavsifBhQLVY8DpX6W4TPguoKgo+R96ozcAG3fz8Hey6dhdGuhl4zwIB7DGuQrVY6Fjz44T3N7ji7OpD1w1a3widaiaLq6/n8mW3sb/rNiBQjfis5oybik1pYd36vvOyhrC4uI/fv08gaG3O4abaNdI3dBgeuesDjYDRj5WzgQsFSgPrS38X8+KA223ivvsy6OnRDzDS6datEu33E5o5v8PxJiUcYBuLKaAO8GOkD/CHZvD92TaWsD2ikNizlHJX5r+Rne0n+J2w9M/aK8Dzh6cyvewWDL7e6CJkbOfA2W9g9DRyBbVozQ5/nRcp5DhGtY/P3PfVuFWSw4tMtbUZ8Hr1g8pONROXS7sSu9YFw1izoC6XOer7GzDUc6vCFpaxj7lR98S6ABiPzRY7A8eHeUgtdkbi70gD+WCUiuZCXAgSxAohRoTbbWLX4U9prC1T8Hans3JloNqn1o8XvbYtqopuGmnwx15gbVSi6cMjI9/4YUIBqVZbkdF28/N38FFzD83NHZyp2kWfo4A/KJ9PKsjWq8CqX2laoUXNYRHV7GMuTtZoFoaycYrjm3855orgXKxuXTuL9aW/I9/oCQStRg/bS3/L8WYlcKHgRHbcALa8PCOi33I0FTXGOtfwzIxg9kMJB6hjOipGei0TOVNVHZrBdyprNGfQavr+me7T5wieS1qYzHz2oeBnPvtoZTJa55lOrDh9P2QZG3iNGxmcvlrIn3mW0oHWXT6HbuVjiBVI6mtqUjS/k895ZkfNGkPgQqJWmrH+91cNa1GV3EVDFUPofTTRyzI2oALdN3w1qdcYzuXqJj39fFy0VBKqNp5Ot27aeh4NdN1VOtIDE0JokMJOIiXIgvuxL5Fqljabn+Ou6qgKnwVKfX+qXKTB1UGTfb7hsnIWFWPEj95MpZNt6mLucLxxXgurDJfZXcPBynd4uO0BGsnHYfOGUnSDhXSmTYtd3VdrH1+51cxzz6VpFlpyucxxP5MC6mggX6fAlMr20tclrTcB4+Gcmch3OFal8Uy8PF1liKqgrVf8yO02UVaWQSJVyJOh4MeAX2ec2vu1Wbt4/3j0926KPQs1yXHYbH66upSI72RamorS2xPRLztYWX1u9iso57qiqiwXWE4kVKXZNnNGAlWi9aiUsYlNaeW0P1M15PNnsCBYoJ3aSF7A9JNOb8w+41bOspWlLFW6rKIAACAASURBVGF75N8GvGy8dj83P3/HCI5HjJTxcM4cz6SwkxDiggpd/bdnJdSbtbVVYT/zotJsH1n4l6i1rbH6f0Ls/rIjpRczC6gOFS/Kp45t6j2UsD/u+rLRZHbX8MKKN1ja9mSo8Ehj6wRW3pfOihXR/TAfur2B6Xaw27Ow27OYYrcy3a5ySdkiFrStj9jH7uq0qJmiri4Fl8uM09lNptKpM6qAYAsObUrMNX5ifIn3HU4z9sXoC6uyJfsHEZkZM2dambRsEYXUsXWzNyL7oaLCzLJlegHs8ARK0SWXQtrqzeDF25+LuM3sriFPiV18bbBMvNDdHfWd7O1VogKxTqxUKk+CgmaasUutTOg87HWuwpX2mO7a49gUqljO9N4/hXpBQ/IFqILr5OP1cR6Q2ERIAQ3sojTm9l4mRKStB7MQnq4ySAArxAUkM7EiJcgVsrFHu8djfHqzq+GtNuLNEMJIzMQmNvMSbKejZzT6vcaT/EzJ8HsqBtuRHKw4wtLq2bqzZ/lGD2t8D0bNYoTLs7Xz+/eHNZxxbyTPmXqz9ucrrTuxWTSVMsNWXuIbNPjzou7Np473ql7T7c8azJiYa3uZ/eeKuatzi257qUy8WOikhckj8OqiX4feawx/DRD43j7nuYZSdsXowzyw3yzaSad7UMXm2BRUfIpRs/2Xqihs3eyNaHm05twDlHRuD9xvzYL0dJTTbaiZVg54b+FeqsJaTyV7DlHJzlb58W2/5e7nvqHbfzcW5+z/ZnvtP8V87mC/519zM/UU6G4bnKku4UB/hXz9Cyjhx9JotUwTyZHfmalNZmKFEBfMUNZugf7sS7IVaIdWmTiwlstIH1YS+2Omt1436PxU2RweQ5Mn7rgjDX92Krg28da1s9hS+gZpRM+ipxt9PDFxLfP6ZzH0ZjsaW7O4I//dYY/pYqe3djk8g2LWpaf5QVlX1Kx9+UrTsGfEw5//yiutFBVZsduzKCtLpAKuwq/9N+HKeFyzcvBj1x4KBQ1a56JONRMnLgytrTzcWRmzP7KFTubwsyHOLMYSbBemfZw3UMD0stv41KW9TLFbudxzmP/kH1Bivi+B9allbMKPMWytbmIMRjCqfRFrZAd2rbLYVcQxZzW7pzyI0trCXZ1bmM5x9jMXg7cDQ1sriqpywPvNsAB2qCm9gcJVK3d/MVT0LXRPgm14qrwLKWMThtD7rJJO50DrLOpYX/o7/vVaN8eZrruW1UhfKICFwIysnhxO4c+2pUzBPiEuVjITK1KCXCEbe6ZMyUJVk//hEmuda7IGZnQg3g8pBT8/ZX7oR8o+5sacDQy6OGZik6MoasRnH1wTG37hwe02UVlpDvV0tFnP8UxPGSW9z4a2KeR4jDGqTFLO8qcTcq1VS6xzZnBm9Z62pwallAYqUdcpV0RUA1fwawZ5w/muVlSY2b07bUjniPBx+RQTOxe+RsXuz9Kq2oDAseR6itDxpncuUvDjx4gBX4wgNihYrCi4nwtXKC6c3mcBKgXU46KSeRxgMs1DmDmOnBUOn3kM9xVe4TVu0N020XOn1nPq0TrPqorCqRNnYj4u1z4x5t792TZaaut4uaiS77Y9SovmrHVgne5m7g/dso+5mjPiJnrZzQJubN4a9zWJ1CK/M1ObzMQKIS6Y8KqgiVKU2OtckxWcvQ30HI39Q0mFiB9rg9c05XCS9EGzhwp+vs6LEfuI2GecNjujRW/NWrrRR1pa1KtIat+ZdLJwYa9u1eig4uI+amu9NDd3BP7LLooIYAFcVMaY/VI4rU7k27NPJzW+i12wN+h32x7VKE6j8Ec+G93OSuenwFDXnbvdpmEHsAA2WlAnZdP3hb+nK8NG8EJVq9cSUeFX71wUXHutvwY7nELgJ9GFrXQ+WKwZY4A72dsfwOYmvWetysqVrIm47Su8HBXABrddwB7+f/bePjyO6r77/pzdleTVyhCtjBzHK8lOG+RAk7uP4yu524KThkJy9cEhqVJiy8ZGBhtLBrsxuSMihRJCVsEpOJUxll/AwsZvdbNP4+blCRBaArRpuB2e3CUkqClYltYBy9YKkFbvO+f5Y3ZG+zIzu6sXvLbP57p0YXZnZ2Z3z56Z3/n9ft+vixhrOJBVAFtMlOt4Om5z5DzPWFWOZLLh8TVuyXgOIhpvceh7MB70W323gh9zY9IjKzlCO2uTVJnLOMsTrKHWnX86CArFpYgKYhWKS5xcBTUMrr9+HJFDACSE5NZbx6a9z+5444l4T5gzVuVhhhWHhptzlHMbexEJFgsSF/up4xAr0Er9DNXdnle+r3aM1NzMsu3Xsqv0HirjQXqFv5/WHaNs3z6sB6BIKjlFSVZl1RLM0rxfsHXrSE6l32Bddr0yQ1kxCH7WEchL8ax8xfAGtQ9ysg/QJrNQBXp5b+4BbPqx+rmMw6M1luXChpgYWLcWFBMlSJN+PjRRwHCO55PdOb6XGDY99sFY7nQnBI+HWGFhEzRBDA8Sl22/+wSSKjrZwzp+ymcZ817G3rrnHNs/KkTy/JBpgbAodAzvE49n/BTE6CgPHPhQxqA7NYiWwM3Vv+Is5UhcSFyco5xajigLHYUiT1DlxIq8QJV5vLeYNhThbhAiSeTDSaTCeN3R8LWsF3vTMjrW6OIdLS3OQk1WmGJPYUGF+zQtscYka5tMIkIGdmVzqdiVt05nCXQ+YGTrxNAQDTxCGxtxuiku4yxnxVyGbr2N6NZtkzqmf/HVuMPWyqv6woG9wM94oCrvSrbPN3ZzplFe6fyZppNaxlrMIA+3iUktOk2u1UDDal29kk66RZV1uXBcTAxS5grRTYu8J+n3PmfS5bc6hQwxihd7ex5snssP7KyK3Iyj4aKSLgbwTYO4VdxCR2wCKdECFeZ8HQp5aG4uIhJJznZ7vZLty5+j7pk1ltZIVjjNJ6lkU06eWs4shaB/515CRzS+8vznzYXSEs8wBbOLePvt7AQIFfmDus/MbyZTTqyCWEVeoCaXmcFK8beWw2k+ralY9XkmBj7OvYwGkjLO8WDdryfl+2mpOBoPRmu93+fx5T9i8xOfcAik9RKwKroI0mQZwEp0xU0RHYgXE1rf7CTeLF8MGDeA2fS2CTSeFKtZtvPPp5R1Thw/qegljHYZIIkm3Bl74y41nILYjVksTCSSqN7aRSUVdHNv3RuT9uvNXTncPggUaMwPYLk/u8WlotAxSpq/iohEzD1m1xebTGJgk6l/e/oC2Oncl46+QKHvNfkzSD3WVI4tcaFxB7vYWfV3vHvPvbbzRa5K9FZk6oVNJNP1qpAR9lGXdo14svRObhvYztiY/ZGs9AAU+Ym6z8xvVE+sQnEJk6pE2thYxJYt6Z6gx5t+5RjAgnXpp1GmCJkVe4uJcpBVdNU1TfpG2FJxFB/NtCCGhvQSMYdMsN83jOYtoZOF1gGsEAzV3U7vyd+bj9n1zk22rDJfMb7fZloylNlJNrj2TDmABb3EOdEfWCv1o/l1hc+nS2uTyrgTmcVwxt44xQS5BbATZZ87uYtOFhJzFfBK208n/buF3JTDBVq8pN36fAP+qOX+nHykR2pupve1TvrbHjPHW4X795bb2pcIS/6QDvP/nOe8TJ+102cxtR51azTKXBF0dWQjw66XxOq/My3ep5p63pMpAdf74tvahnmrZ5D7e1ZDZ2fSfJHaslLL4ZzbERIpCh0DYX2uVp+ede/9RJ+rVQAL8PW+ux0DWEgua1coFO8tKohVKC5wQiEP1dU+07bCCFifeKLAso/s6313Z9ynVdCQGNjaC6UYN8XrWdb26UmXnoK9qIxxM9kd+4Dta4vFIMGHSA6a/P5kW4Sde83z0wK6D6XVzU5xMdMqRpUPGN+v0425QGND8X4efLRg2vp+R2puJvLyq5w78w69HZ30vtZp/ntP3fNxm4wJXMR4rKA+L8Wz8pU9bCCbYERfaFqZvsijafiC9+fcI59ITc0427YN4xQgCjSq6ORJVhGlxHY7w69227bhjGJiqSSOt6/tKKNYDCY9X0zUwWpL8Cw30MAjQLbiULlilfl0pS3oFDKS1tOri9DZB7xn5RwCAZmWfZa4qKQbLYOAVHYIAgHpGIgaFRjucDdCStzhbmZvuWtKfe6+4P3WPrfA6NJPmXO+sUWqiF8VnRxkJRIXZ71VrCh9yvI43VnalE1WAE2hUEwNFcQqFHmGk9BSYra1utrHwoW652JfX7rnol1PWqYLs52gRmJgaxnsxW+KTxZcybK2T0858MmkOGqXWXEzTuutv6CmZjw5aHqtk94OPWiKvPxq0vlFm+9Der1pNzsV/n727OGiKxWLNt+HxP7G3M04O9tG+GZnzXsmXHXT1iU82jZKhb/f9Hd8ovRvWLb92rwUz8pXYrgdnp3wSV5Du3WPuBDTEnDU1IzjcrjD2F+6mZPigywPvEhFUY/lNmWco5bD5v6mkr2rqdHnhUpOmYHMHtaxmw2OCtl72ICEuEiUdbVAZpz6aK2fSQy49lFHO2vTHqvilO3RDr5vo21w1U2FQ1Au0N+nYTnkTKYALrGCxzxClh6wVhSFjulaDjZ43nidaPN9nDvzjrk4Cckififdf0itOGqK8w20fAfp9SbtR3q9BPzZ6SBcbJU6CsWFggpiFYo8InXV+mj4Gj5Sfx1zy0tYtMjHxo0T2da+PhfRaO52EBWu05aPS0hT3E0MmhcOvsohVgDpK9uVnNJ7VTmCLJk9lY/AxKqEUKBxikqqxCmu/7N30hVJxSC76l7IuRQysdS1Vhzl9cBS3m5r55evwcqVU34recdIzc0ghO1iRHvpl89L4F5TM84vX4MzPVFO9JTx2Y4WFcDmiNs2hjWyfoIYHtrYyBx6OMQKDrGCBZxEEKNAjiKIsYCTNPAIC4d+w/vq61i82Gda2mTLmjVjWAdCgiZfK+fOvEP79fsZGC1I266YKK1snnSwY8VNW5fwSttPieE2M9CZFLJjuBmqu50VfutsXToyg6WMTPiznrvdaGbAlXieQZqopIsuKmmmhb/kh5Zl+BIXzaLFNriqiOsE2AfvLirdp3FlEbS7XDiOC6vWFKfHE0ltkTneeELvq7fZXkDSwouxOJmI9Hrp37E7aTEztdXBuA42BT0WlmTJOJW1KxSKmUUJOynygku54T5R5KLCdZqW2FcnYShvR7pwRz072cmd6VumGMtbCSsZYk2tbDazONLtBpcLMTY2sZWDwnEuGJ9NOCwQJJfGeb2S5cvHeOYZz5QEQjJxsY5NX+MWvO2PcZgVNNNCF5VU0kWQ5mnJpCtmFrtx2dhYRHt7AdkL9hgqxnbZwmQV2VxFbMrLSyz3LYRkz63Psbn94ylzXPIckzovTQf+xVdzNHwNm2lNsSJKP0+3S+PNt/RgL5NgVaJA1imqLPeXHRpaPKNu7MHqelBMlEG8WOUjhJDs3DmcNod7vZLdszZxS98ODrGCVRyyPE+B5NM8y7Ncl/F9pI6LxLFppyJsJR6YSCjkYctmD0OjBQnnpPFpnuG/qU6Yr6xF+4z9m0r8WaoeW51HU1MRfX36Z+DzQWGhVOrEFyAX67X8YkEJOykUFxBWvaxdsQC3cJAGHslCdCcTVjeugh+5Pme5dWofrJWwEgh6uYL17DWzskiZFMCCXi42+847bMsQjRX28vIS5s0roby8xDLTU1MzzqvN7VSK7rTerqEhwTPPeKZUYngpE926jaG626l1/QMnWUgMN28U/5EKYC9wtm4doa5uzMwGOmcFQb8NcMptTTAZEZtAwPrY8+dLG/9OQQlRMzDJVtTL17iFOfNKmVN+GXPmleJr3GK7bfv1+6ljX4LfqvGXLrK0es3EnNLcPEJhodX70QWC1vAE+6mLK+FOvk+yUoTpb3ssqRzW6nowiA+3TbbUyMLq1Sr6X2mpxvblz7FytJ1DrGAzrbbnMD8g+S//J7J6H07jwiobeqhgDQsHX00SIUzMuIZCHlqax5MCWNAzzM9yg+mXe4oF1LFv4lqUgJHpTWwp2dvcwdXBuqTjGNi18dTUjNPREaWnZ4CengFOnhygo0NdcxSKfEBlYhV5wcW+QpZqKfCZD/6WI88vZBBrdV3DEmFq60zW2ReBRqxgVsbMaSavR8N+winHI91u5OWXcyTyWZrcW+mOzafUL+nvF5aqj1aZHv/iqykId54365uLfWwqLkwyjcvcLbGyRdLTk/6bO954gnv2/w96tVJAVwcPPqQ/Z5UN3LZtmIb6IuvfdTwTmW1Fx2Vf/ByFzz+Xlnt+Yuke7vv1csIRHxV08a3Sh7mp5Y/5o+ab6Y7YtT3oM6/bDatXj7F1a3KpaLLXKfhFhO1yE8sDL7Cw72W6o3PS9pgLxQzSWvcLbtq6RP8ON9UjxsYQxLC+HmgUM5QU4BpVKkePJov7eQvH2Bu7DWLjrKWdUawDT/P7aZiVtd9v4lycOjYTs6EH37eRO6LbUgLU9Iy/3kab3bF9vMscIknZ2Vr3Mfp37Db9aSeyqcmfx7bWcUvbuemqJlLkD+pant8on1jFBcvFPLmEQh42bZqVErRl9uOzM6fPHutjVNHJG/6PIYt9tiVWoZCHjRtnoWlO56hxkFVJJXmppcZgXQbnRKr345y5l7NQvmF5E27nEzmdXMxjU3Hhks24NIKHo+FrpqE1QUeg8SSrqHUfY/TPrsXzxuscDV9jGRQVuMfZvkNfLLPyBV0y7x26YumZ1ko6eSOwNKvSz6LQMWbX354201nNO4WMUEI/Ecqwm3+nsjCWaeHPGWkG/omLeEWhY/yg/l+4hSctA/5KVzct2j3mIuH8gP756m0Y6dtXxe8N7RY13IybugK5+P1W0kmL+DpfK/57woNzmD9fSyq1TWwNydb+KftsdvK2pp84Rzic4fpT4TtHZ2E1R/o+k9JW0cTywIuOJc+KCwt1Lc9vVBCruGC5mCeX6mpfXD04V7QszOmdMH4zFhd3cTStzyz5JiP5dXb71wPt5JKvAoZpZy1AQoCb/Y1d6k2k0b+W1g8mBnl4p5jxcq6LeWwqLlxyGZdz5l7OYbl8Ur9HK1KrMJwyvU4LTccbT6T1xBYTpbXupazF2ex6Liebfc5lYSy133Lh4KsOGV47JJWim3tvfd32PX9sEZb7FUh2tln3KNsF1IYQlFVAbDwf85bQv+0RDlNroYtg/R7qeZT91KVlhLdtG8bz0n+w+YlPOPp6W+1zKuO0ik6CNLGGAxkWg63P3ejNfrDu11PyTFbkD+pant+oIFZxwXIxTy52wiaZmDCpT/x/sC4py3TBTxZLSRXVsBZxmjxlnOVdZjPGrJxfm3oTaZRFHh76vLlSXiHCjjd908nFPDYVFy524/J44wkeOPAhumMfoML9e+5d/TvqnlljBnpz6In3gk4ekSA8BLqfr21QlCGzaXW+ufyu58y93NIz1Omc7ChkhNY2LePCmNGzGY74zKwdwGa20+uQ5U2dpwsY5vHS/8VNLX/smHG2z/Bal3aDvQhVpkyssUBhXCNCIY9t2bdBGWcpIWq5zwp/P6Kvjy6Znefq9JG+iGtH6rU2EQ9jXOaOEoldjtsNsZje560EnS481LU8v1FBrOKC5WKeXCYTxNpdVJ0utpkwsycWvT65lI1lx+RW0Q1FzdSbg6kqTE6Fi3lsKi5crMalVWbTuJmvFN20yHsAplxabMwlBpPNxE4HdpnY7IN1/fMpo5cH617JGEBbLfgVMIxA2PaYmucqevGVFhLuK8lJ2dZufnb6bK3O01s4xt7RNQDW5d8JVTRNtNAtqpg/X/J2OMoA1hlmo7rnFg7aXrPAPvPrTOp1xEpJe2oZ28lSLAZpvfUXKkt7AaGu5fmNUidWKPIQX9b3ilL3XBVdDg6Dk79Yd1GZ5gNrkMmw/r1AoHHrrWOWN3WJCpOGt59CoUjmgf1/aKn2C4IuWcl68Rj/xp8xSwzjrFjshOQv+WHSI0GaKCTdK9PFOIODWKrBTheWXqA5vL6SU4wHqnij7XhWAYmVavsYszIGsF6vJLizWPdBzlHZ1sozO5M/aU3NONu2DRMIaAghCQS0uIiR7jm7jzp8vIuhXOwixu08BugBbhcLkFIQDrsYoMTmKBIvg9zCQVtP2Qq6qKTL9vXO6NZqoFFFJy4bxf3Jj+XJMyiLaWz/iK0Cv0KhmHlUEKtQzBCGhU40hyTEk6zit7c+QKX79zZbOJcMOzE/gG0AWFo6+Rtaa3ILit2M8ySr+PuPPznJ81AoLj2ON55gybx3mFvu00WStPmO2w/KYnbRQERmKnl1QvBjbkx6xAiKyjiLERT5eBcPMSIRlxkMbdkya9oD2ZGam+nf9gixQAVSCGJxWxpdvMmZYqJ8rW1uTgtjuS34xRcm3eGc/XUTsQxIs9hfTc14kgVZLYdBTJy/xI2xyKHhZj913MZjFgG5jQgWkl6uiIfAHlLHjtcraRFfJ0gTxURTXmtkVZ2RCMropZOFaLa3rFNfhBU2QbgTEcq4vL5uxhZoFAqFMyqIVSimCcP7dO7cEhYt8rFx46y4oFO2F1jBZlr5cPvX6YrNz/GiKiliyPZZp1X7UMhDf/9kbwJyXQU3PAsnKCbKflazkiPM3nKXWtlWKBww/CwPiVo2t3+crlgAiYuuWCBLzVf7rVIDDTtOUckCTuIixgJOcogV/Bt/ytvo9jpuYgBpwdBkfGYT51W7YCG1UkMLVDhk/8Dwdd1Vek/OgaXhv5oNVZwi5i3hlR1PTbl/MjUgncz+jjf9ioXyDVzEWMMBS9/ZEbw2r07GurVF4BYxhMAMtJFa3OPWG/cs1rOquVw1epnDIVaY42q6KSbKBnbGbYxyQQCuGVugUSgUzqggVqGYBoz+o3BYzzpEIq4M9jTW9DKHLqrQ17hdTAR9Tpd8yXU8w+PcblHSpxvcp67af+2LXcwr91JeXkJ9far9z0wiKGQUP2cR8ZuZPawzLXnE0BC+4P3v0bkoFLnzwy8eZVH5MOXlJebfh8uH+fG1rTN+bF/jFmY3rMMd7o4HBslByMSckStS/y2667MKFAS6MJDExSkWsIb9tLExno0TxPAQtemhDIcFP6luMherEueieeVevvbFieAzdV7NNliINt9HsOAbFkG5HrweZCVnvVXc1PLHGd9rKlalvQUMp829xUR5oPThvPEaDYU8bOh70Pze7BV7s1sKsRtlGi40DbNXdz174/3S+jGLGSJIE5Wk9zHbI2imhfXsYnpLh6V5DXpUbOLxpfsoYHRSe5rMAo1CoZgaKohVKKYBqz4pZ7Itw9VLvYRDl2x1tUZH4C9YxSFGKcQs5/NJ2tqG6eiIpgWwjz9/lXnDOfVSrNxuKkYpooQoMdx0sjDJUxbAdTo8xfNRKGaGH37xKHc8fwsRrmDityPo5Qpu7WimvNzH387/h7RqgmyyiU4UhY5RVr0Ab/tjphJvF3Zqr9K2isOpuuMUVayRT/CH1c7VFULItAycbrNlNXdZn8Xqvu3cs3GML300eS6K4eHx568yA1mreXVoSPDt+jNc9sXP2Z7jSM3NLNt+LbtK76GSTgQaFb5zHCjdxFkxl+WBF6cUXHpdRk+x/jfbJ1lZJ5LKfR9uc/HZjpZpC2CNDPycuZfjX3x1VhUria/59p290+ITbFBGr+XjiZnqYLDIMtvbTAstfC19MaDAfsG2i0r+jH/HR7/tNrlS6T5NkGaa3N/BLWPc9+vl3O7aF1dwzrR4nE4+aEsoFJcSSp1YkRdc6KpxUzO5zxY7FUb7x+vqxti6NTlDMLfcN2mFY6tjCMaRKV6xmRBojAUWWKqKptr/nG8u9LGpmD6WlEfilRJO6L6TD9X9H6Jbt1mr2brHuUy+S0R7X0ZbGcNiSgwltwvYKQJX0sm3Sh/m631300UlbhfENEEgILkxcoADg39toV6crPb6/vdL3norXQXW75dEIrksfDkpx9pboLgZ582eIUdrmYOspLbo/4HR0fdMsTwU8rBls4eh0fT5rtAdo3XH6IzYrliNASuVedAXKZ94vpoYbtzEWM8udnLXpCyHnCjjLEMUW/rCzp7t5Z57tLjnuLVX7f7SzfyNaI2PJ12XoaVlhObmIiIROxs5yfTlXhLvKZO91P+EF3mWGyzP3YmZVuJWTA11Lc9vlDqxQnGeyKVPavLYZzbsHj9wIP1mayoKx1bHkI5G8tYE/FFrVVGvl2jzfdN1cgrFtNJNRRZbCfawAe8Tj1MUOmatZhvz0Kv5zV7Wze0f53jjCcu9+YL3pwWwgKVYTjFRgsUtfLajhVfanmU8sIAx6WE8UMWrze08/KV/Yw/rqYpnJ62DTMFbbwk2FO/XPT7jmcW2tmFeey1KIJD9XOcyj2GFfTAcw01R6BgVrtO2r13PXg6P/BVCStzhbmY3rMPXuCXrc5sMwWCRZQALMBpz09w8M+WkVmPAqvUitcomhoc2NtLAI1NQCLZ+PkIZe1hnjiVDvApg/XrilkDW328pEe6IbosHq/o4GB7Wtw0G08u14+8Ye4/0yVx/BVZjcBBfhgDWusUnk1q0QqGYflQQq1BMA83NIxSLwfN9GmnEEtrbDCXTzExcpAsZtLTOSCa3oNhbOEZT0GOpKpov/WMKhRUVWfbxxXAjpMQXvD+rEsNBfDxw4EOWz9mV16/kSFIQUUUne1wb+NzDnzAzd+5w90SQt7kB78H9rORwXOnV7XBGggODf823++/i7Z37TCGhUMjD4CBkFzRI7qAti+3ScRNj9pa7aIl91VZsyihLNc9YSnPhYLJkKvvO9F1GImJGlGqPhq9NE9ICcIW7mVN+mfnX/vwirBYldlNvu+hh/13KuNq03bN6n2qQJjTcdMYqqKkZ10uIHS6FxWIQfL60xQCjpzRRifl8WOfoOH/PxWIoaRshJMuXW9vDKRSKmUMFsQrFNFBTM84eOXFDqaswnn8MN4XjjSdMJdPMQefECvUYs7iW5xJ6hKaKZFvruHmxV/6vj1vqgwAAIABJREFUiguJbyx9mgKGM25niCO5TocJlA5kte/u2AcsH9fmB2xfU8sR3vB9hJjw8HpgKcse/RSHqeUjd34G99BAUsAjRkcRY2OW52nFID6ax75hZvsaG4toaJiVkD1zQnIdT7OTuyjjXIZt01+7rvAJDg993lS1tZt7TqX0BRsLB5PBSUTK6C2tkKcy7GV6rYSKQsf4SXUT69mTJKS1nr0cYoU5Ux9mBQs5aWtBo+EyFz0q3WEzu95a95KtnVuFfwCfewSnSp/EcwEoq17A6bB9BjMQ0Hh4p6BvcJblFsYigaHEnO3yaO5jbGoMyuKk/5dS8MwzSplYoXivUUGsQjENhEIemtxb6aKSSrpYzy6bm933dmXZJSQ/qW7igfaFkxL1kLj4F64nSBMHWYnTqn02lJZKamrGKQod46mFd7OkvFf3uCzv5amFdyt7HUVec+P3lrN76ZP4eBen34KuogqytJRv99+VRTWDntn62CLSgh/LsnuAsjL62x6j9+TvzUWgw9SyZcss03YnNchIJZPa6ykq+WD4ecrLS2hvL8ih71/w31QD0MpmhOWinlUZqOST1WGuGf1X1tJuqtrm0koxWWE4OxGpluZxfrDpBf4g/DxdVGZlfTYdSrVFoWPM3lTPvX13W4ojbUZXwz7EigQFYOfvp9b7fV7Z8ZRp03PT1iV8bUdZWvmu1ytpCnrozuA7bJxLMy16sW9fhAphXa0QCEgzo2/XfpP6eIXbrpx8gjLO0crmrBaXsmNy12gl6qRQvPeoIFahmCKNjUXU1yffOD7G7SzlZ6ReED2MUxa3l5nOgFbP/KbvL6a5qOv7roOSaWYkLpppYSVH4sqQ6fjox5WNNYfQb85+cOfPuCP6Xbri2YUuFnBH9Lv84M6fqUBWkdfc+L3lnOwRtLUNU0iySi1o1PMoO7lLDzwlrBzbz2zezWLPgu7IbDbfWZgUyFqW3bc9BufOmZULRqbw2/Vn0gKx1LLbQ6wwS1N/zI1cxzPYzUUC6DKDo0wCTckkzjkFaXOD0YubLBxVVzfGP77wPja7HknzmM0Wp8y1FUYJcdgmg9gdKWH92KNmQJ1sfWaP3f6ypaTpq4ixMdu52/BOtbJaSsVHv227Ri2H2T1r04SKs7/ftGTLVush8Rxb5D1prTVCSMLhiVJrK5siq57Sllij46JBISO0uu9mJUdoZ61ZCZXNtSiZxO/T+Xuzq154b3QxFApFIiqIVSimQCjkob093V5ijFn8K3+R9vg4BfQyB4nIakU/nfSbp2IxxP6C221teGJ4pizl1EUlUgh2s8FixVsyixHuYBeFrjHL1xv09Ql8wftpjn3T2noh9k3lE6u4IKipGSfcM0ZPzwA9PQO807aP8cACHhWbzIBBvN0H6CI41qT/ZkdjburrZ7Fk3jum2JNT2X1i/2u3TcDTRSWysDApa2csuL3AJ/ExkHYuAi0rNVu7m3pDSKiZFouA1GpGEhw7VoCvcQu9WmnG41odWwJiMJq2EGZazZRfxpx5pcwpvwz/4qv52he7qK+f5ShC5CZmESQKXGj4C+0z8m6nluMsEH0RAAdBJr0nNbWkOhUXozzUVmDZrmGMnVv6dnAq3id9amgutRwG4PrrxyGL61TiOa7kCHvkOir8uhWOQItn8CdKrQGz79UobU71MgdYHniBDey0uFZKylwRHq37d5bt+CSxQAW14iivB5bydlv7JFSY7RZpUq61RFnPrrTeYiXqpFCcH5TFjiIvuFClz/UVfLsLppO9RC7bJONhjMt4mz7KCPgiND3ko5bDfOTOz8R7Xq1JvSktYJhCRokyO+M5GNYB/sVXczR8DZtppZc5Sa8rJsoa2vmhbznd0TLLfQYCGl2nPbjluOWNhkAjJjycO5ONANV7w4U6NhXnH//iq3GHu23tcDL9/ouJ0lr3kqX9jjEujWOAg+2OO8wrO57iI/XXxTOrTkjKOEcvZWRa5xZobGAn+6lLCvSKibK7+MusHHrM9rdud2wNV1zVOPPcedvS37DnlaWIvkhyTjfBfsbOougQK1jFIcfjeL2SoSF7W5diogxSbLMPSU9Pdv3QVswpvwyR8Tw1dFdfa9XeSk7xjaXPcOP3llseI3HsJO3V72dPMMzG+gK0NPX55DFbTJQ9rDP9vo3scBeVuNDiSsnJZGtFY3x3Rn90F5VUiDD33vq6rSUVZLou54JGIACnw4IK0U2LvIeVHOEQK2gSD9ItK5gf0ANYJeqU/6hreX6jLHYUiveYqffBCNyMI9Dwc9aidy598WacAgQQK5jFrx86Tk3NOCM1N+u9TYX2mVCJSFIybWctA1zOQVaaj5fwLqkr74mrzNHm+6j1fp8SolhZE/yYGzk1/H7a2oZty8W0+QHb7EIlXTmXAyoU+Uq0+T6kEA7KsM4M4uOB/X/ouE2icm0vfqz6TK9bXc5Izc22mdpkRNoClTUSHwPsogEvg/jjbRKV7jCtdS/xmc6HOXfmHSpsM4nWLORkFlvp7/GpNz7MIWrT9XiHhphdfztz5pUyu/52S4sivafUWYRo27ZhKvz239MgPttMdC5WRE6s5IitcJHbJlsu0DjISn5bFzQDWFN5ubyEJfPe4enyDbgsAlgAEYnQfOeIRQCr790ona+k0wxgD7GCOfSwikNmpt8qgIXsr5tGKf3ywIucFB9kLLCAV3Y+4xjAApblyoWM2PRm21Pp/j0vvxzlTM8Ar+x8huWBF5FCsDzwIq/sfIYzPQNmn69CoXjvUUGsQjEFnPpgfPRndaOq4ULDzTnK2UddUqBpRy9zEGNjSaW3NTXj7PZtwba8jRidLCTmKqCThebK+UqO0MlCxv1X8HbgjzjILbqCJellXsZNhV2fVheVEIsl2SSklotFm+8j6P5ba49L998qn1jFRcNIzc0M3XobtVZ2OKzDn4WqqpO4TijkYb2YKA8e4DKsLFYM5dSAQ0CW+hrnIFbDwzgDXIbERS9XEGEOPjHIvat/lxRkfLP6oMU86CQmtSDDsSfOLxx2saHvQUvhKgGIWMx2T3qgbo+RXWsKevC67UtFY7iy6u/MFen3m/9uZbPlfBmzsUmS6MrV0a3bgBTlZQRdsQBr2ccV9KTZ9oD+2UVilzucnaCYYe6tO8mytus4JFaynr30cgXZVBbl0j86GQV7q+vPo3X/zpMFt8cXalMCXHcsrU2mmCj3rv7dlM5DoVDMLCqIVSimQHPzCIWF6RdkN2O0lTbTWvdSRr87V1yIYgEnkYVFnHT9ATHcnGRhxuOnKnGuevtR22PFjJ+7pqWrnXq9DAS/Q+TlV7mhZxcn3rzcdpV5pOZm7JKllXSZzWCGTYKhhJkYCC/b8Ul2+75siolU0slu35dZtuOT6uZAcVER3bqNobrbqY0vFmm46WQhtYUhHlr6fQoyKBc7ZTKDwaI0uw8rjMxXU9DjWK3hzET2rYxexilIeV4wIEvY2P6nZi8vwF++sJmd1d9N+q1/gG6sMsa5tlZAunAVJItXpQZo2SFMm5xaDrPXdYetbVolXVn1d0JmH9pEBoLfQRYWAjaewKyj0ka9tyo+DxeFjlG2aIGl4NcoRfRyhdkbvZonzKogD5nHyCDF3P+MPl83yW9lrX7/XvWPpl5/btq6hGXbr40v1K5KWqht3THKzrqfxx+bqCbIlPFVKBTnF9UTq8gLLuReBV1tsYhIRL9JKC2VtLQk98gcbzzBxvY/tRA4segvctezbMcnAVhYfxMRi4xBGWc5RzmxQAWRl181H/cvvpo/CD9v2RdXRaeeiQ1UEG2+D1/wflynw2jzA0Sb78speAyFPGy508VQbOL9GL1RX6jzmRmAi4ELeWwq8oei0DHL31xjYxEH9nuIacY8kDwf7PZ9mc+cfDhtf1dcMRuXS2Zle5PYgxgKeQgGizh9WuAXfXERpeyCR2MOcRFz7HMVCc8bFii1rn9gaM1ac25obCziwIECYjG9SkRfZJvcurpAQ4tnJQ3xquSgSvet/SmfNR+ZQ088c2hPIKDRyQLc4W7L/Rpz3g09uzOeYyjkYdOmWYyNTXzWBQWS7dutA15IGDPhbhACkXC/Jr1eHl/+Iza3f9zynGo5AoWFuj8wMXL/bDMvKgghOXNmgLnlPsfx4GYcDRcBf5SmoEeV3yrOC+pant9MpidWBbGKvOBSmFyON57gnv3/w1TeFEjLC38Vnbzh/xi9r3VyvPEEDe1/ylhC8FvAMO2spbbge/Rvb0tTKv1B/b+wnj3WNzbe71vaLEyGUMhDy1eihKN+KuiiRXydL9zqvagCWLg0xqbivSExkJXvK6Wi71f8ngCpVjOgB3brxV4e3Flo+Xu94orZVFRoGQVsionycJsrKXAoCh3jn+/+Bc2DTXRRRbZBrBEs2gtVWVPICPuoo5YjDNXdnjRHzJl7OULKSYtfAVT4+zlZrIvOreGATS+m/rkaQbUsKOKWsX2O+xZIYsJtBo+JokWVdBGkieWBF5MWEu2orvbR15f+XZWWanR0ZCdyZLUI8pPqJu7tuzvpnFZyBOl2I2KxrASs7HH+7I3FkY8tgu7IbMttiomyq/Qebmr5Y1VlozivqGt5fqOCWMUFy6U4ucydW2KZRRFoxHBzrkf3lgyFPLQ0jxOO+PRgkSZWlD7FQMt3LG8KfI1b+Kf2QZoJJt3YrPA/xUDQ+jUKey7FsamYflJVcq/mV/yGj+Kojls4xrbWccvM1RVXzGbXriHurteSFqwKGWE27xKhjEq6eKD0YT7boZfbGnNJd8TnoGprj5GJPcQKbuFgTq83Xivdbs69qVsPHW88wQPtH6SbCvz08i6zGWNWwqsyB7CFhZLWVr2fcctmD0OjqWXOFq+JB9W6yrp9NrZSdPHG/GssFXwhWQU5E+XlJcyEirGV+rL0emFoCIG9YnV2SKo4xSkq08aL1yvNsulQyGPx2Uv8fkkwqJR7FfmBupbnN0qdWKG4gLATt0hV7q2pGeeXr8GZnignesq4oWc3vR2dtjdO0a3bWNb2aV4PLCUmPLweWMqytuvofc3+NQqFYmbxBe9PCjQyBbAAQ6MFfLu+x/Q39TVuSXq+pmY8rVdyH3W0splKuuiikmbRQijkobGxiPr6WfGMmcvWlsUeSZAmQO/R3MBOsvEQNTDF4GK6mu/xxhNsbv84XVSZ4lACQVlc5dhFjGyyhyUFw6wLVvP1+oGsAljQ+0GbabEUTDIoJkqLvEdXmE7VEEC3oZmuqhYrsu2fNcT2YoEKpBAcLF7HgpEO3PF+4Ew+sk640PQ+bm8Je+p+Ztv3W1MzzrbWcfP5qipoaxvmtdeUcq9CoZg5VCZWkRdciitkoZCHu+tl3GdQxyj7XVH6FL0dnefv5BQml+LYVEw/RtmsgcjKCxV0cyxpVlQYPedWPrFg3RPqdst47JjpeMY5WXuSyhQ13EOsSPCM1oMezUYxt4yzQKoqcPpxjH7/bD8fw6u2jY1ZbZ/4uhhuDsdLhE9RiRuNGC6q4p91LUc41/OubSlvLixa5CMSSV848Ps1XnstOZA21IQTxZgSM592GAsDid99qj94LgjGOVDcwOce/kRO71fNmYp8RI3L/EaVEysuWC7VySWxnM64SbXqdVWcPy7VsamYXlKDzeyD2AmKibJH3MENZ9rMcZlaTjqV8tEqOhnAZ1liW0WnqZhud9aHWEEd7Uk9/AAexoghkDa+oclIDrIy6z5OFzE0BLkWlhnvx+kI0uXi3Ftv57RfO0IhD5s3z2J0dOKIRil0amC6eLHPstc5UaDLiiXz3qErli4dP5VAttLVzYm33pfTa9ScqchH1LjMb1Q5sUJxgXHT1iW80vZTxgILOCk+yPLAiyqAVSguIIpCx/Avvpo5cy/Hv/hqikLHLLdLLUu9iv/E2mbGnkF8NMlvJT2WWk5q5+GciWKiBGni7wsbrT2cacroHruSI7SzFp/PsBWTuIjhYizLABZA0EwLZVl46ALxzK/drYzE6jMtZIQgTUifvS2MBIbWrM3qHLKhpmac1tZkKx6rABYmLJGyfdygO/YBy8clmCXndlZBtvt08ClWKBSK84kKYhWK84wyUVcoLgyON57gw+8fo7y8hPLyEhYFJD/Y9ALucDdCStzhbmZvucsykE0NNl8p/TRXid8wEWhJ3v9+aQY5dgFtt0WQmjiH2Hk42yOpdHXTWvcSN/Ts5jPhVh5uc5nnUekOs4d1rORIdnvz+dA0I9zVw6ZRvJlelsQpquJlx8k9ty6XxOXKvirLLyK0tQ1TWjoRVJdxln3iNr5Q52PgoVbLflfpcqWpKFuR7QKGgZ13dip2egl2jxtUuH9v+bgbzRT5W88uChh23E/SPh18ihUKheJ8osqJFXmBKvNQ5CtqbF46JHqozp8vaW6eUFa183o2LK8Sg7xU/+bJ8LGFI3RH0z2iK3zn+OXJIttxadVPaY/ktqW/4dvfs8/epvbyZqLK1UWXVpH19pnRjx0I6N9HQ8OsrLxxBRoHfBv43EPO/Zy+xi149+8DTQ+YpU8PbjMtJtqpAk+H2NN09sRaeZGvLv5Hdg+uzlhiXMgIj/vusvQpdkLNmYp8RI3L/EaVEysUCoVCkQOGCmx5eQkNDbMIh11IKQiHXWzZMstUhX3gwIfSAliAMWbRTEvSY67T4SmfV9NDPrzukaTHvO4Rmh6yL4EFPdu3fPkYIoPSMGgZA1gALcfU7vSXnwoCAcnLL0d56SU32cTThtjTqujetMx4Yva0rHoB3gPtCE0zS6Vd0SizN9VnzKqmqk0DiKEhfMH7J/Eek6mpGWfbtmFbNWA7bv74G+xxbUgpHU4O+Afx8UP/ana2jeAtHEvZQ0rG2r2Ozz30iSm/H4VCoZgJVBCrUCgUiouaVLuSxsaitMAVRFqGb2hIEAzqgatdvyGQ1ofqFPiZ51JewpJ57/B0+Qb8i6/meOOJpMdn19/O7su+QoW/fyKQ2aFlZVnyzDMepI0naaU7TFvbMD09UTOAdSqLHbn+Mxk6dZOxK2m1Lo+WKX/WnD4taGwsor29AOeuXEkVnTzJKnZyF5AcWPoatzC7YZ1Z/u3qiyDilj+JiLGxjMGo3ULFdCxgQPalx4n4gvezUjuo2+LgRrO5xTt9WqTZ4gQCGnvrfsZ4oApNuDkT+BjLdnxStbcoFIq8RZUTK/ICVeahyFfU2Lxw+Otr3+ZnHXYBZGLwI8leGVjS0zNgq/wKui1MCVG6qKSCLlpoYnngxTQrllDIw6Y7CxiLTYgcFTDM7TzGfuqSykANu61a7/ctS1SdxuXc8hLbIBYkpaUgBPT1CQKlA3y7/y5Wju2f2CqhLDZVVdkJ6fXy+PIfsenop5JKYXV1XN2d1gis/Jzj5sLjHBhdkVL+mk4goPHmm4JYzPk7EzYWP1II+nfuZXbDuqxLo6UQnDvzju3zdp/LdJSSO+FU8p5a+m2nVJ1J5XiqqDlTkY+ocZnfqHJihUKhUFySTASwwuYvkdysbUIhD/eu/h0eUssvASS9zOEUC5C46GIBqziEJ3yKy+vXMre8mPJ4dvUrDVpSAAt6OfJu6tMCuUF8NNOSc4lqUegYFcIu6NStaPr6XEQietl0d2Q2t4zto4FHJrZKOKZTZtEQQZLowVv/tke4aesSti9/jkrRBWgJ9i4uNNwUM8hBVnKOcn48dkPGABb0QM0iYZpGheu0+e9DrGABJ3ERY4Gri+NNv8qptzdTGXWq2jToQXy0+b6sj5ErRq+sXcl76jkHaUpTmvYWjtHcnFymrlAoFBciKhOryAvUCpkiX1Fj88KgvLyEXIPTbDEyV4sWeohEc1PaTcYuA2z9uJFZtMoK2o1L/+KrORq+hvXszSJATESjjF4ilMU9q5tZ1vZpZm9cj9C0tK2l203/jt2W5aZGltIuE2hkrk9RRebvTDIeqKIo/AYxB5seN2Psd62l1nOMw6M1rKU9qYe5kBH2UZeVyrIsKMjK6qwodAxf8H5cp8No8wNpmffpxs4/ttId5pUdTwGkiU0dopYmgnRTScAfpSnoyao0eSqoOVORj6hxmd+oTKxCoVAoFNOM4c/ZNzhrinvKLciujNub5CKu5DodZiVH2MM6MvnOprySXq5A4uIUC1jLPn6w8TkOa1+ayGhykkOs4JD7FubM6ufy+rW61dAin5kNNM4B0nuFDYzMdTafR6Xoxh3uZj27HN+PGw20GNJXwmbXjjQRrlGK2MCutNdJQPOVmF25Wqk/a6/u99oe7XTY+vPqjn2A2Vv0HuBEG6dYoIJlbZ/mRE8ZZ3qi/PI1ZjyAVSgUiveKbN3HFQqFQqG4JDH8OefPl4RtAomp4KMfiTutJzZIU84lqtr8gNmrKZA2vbGZGaWIDdqjaAnndYoFrOIgIqYhoxO3D5GIYPOdhfiavsqqtx8FlwtiMSrpsszEZhvMFzPIh+Rv8TBGDDcgE95T8j5GKaKZFmrfPkqvLLXc3wCzOcQKMxsrhWDo1tsy+sHmCxXu05Z92ZV0mSXgymtcoVBcKqhMrEKhUCgueD5ZHSa3zKMV6a8vKJBmD2Fz8whe7/S2trgZY1fh39Ba95KuFBtXEN7DepYHXrQUdSoKHYMFCyzVhI1ezWZaMvqAZmKA2RYlyS6kxfr3aMzNvX13I6RExGJIrHsynZH4CkcRQlLh7+dPeIFnuSFeRizix7YPgLuo1ANoW4RphySB/p17bQNYOxXpTNY7M0lLrDHt8zQWO2D6lJEVCoXiQkAFsQqFQqG44PnHF96XEMim/k34X/p4lzLOAhouYubjZZzljsuO4PdPbFtaqrF9+4Q/Z6J/p54VTO0VzT3AfR9vs6L8p9y0dYluqdIzwIk3L+eGnl2WWbWi0DG9dPTUKYSUuMPdSV6oIzU307/tEdtS3onznH6dicRjGqGml8EcjieIjhbg9UqCsol/4S/IRZSrki5ELIafc5nPsajI8rP1L76ap8s3cHeD1AWUEHTFAqxnD3eFv8pH6q9jbrlu1ZRYQv1esDzwAntYZ/rAVtHJHtaZmWWj7LwodIyfVDexpLyXueU+Fi30sGjRhMVU4nk72SspFApFPqOEnRR5gWq4V+QramxeHJgiPCm2KKNLP8W73/vnSe3zeOMJHjjwIbpjH8CF5ig8ZIdAIyY8jnYuiWRr7WInApQbuVgRQRWddLIQ0NWBcxeXSiSGvs5uLYRVzKClJdFKjnCIFaziINbr9JIqThGkiWVt15mBbFHoGD/Y9ALNY9+wFZyaUFrW8Xol27YNv2d9psYCRqJwk4FhiwTwg00vsH7sUdvPXgiJlFDhd7ZXmixqzlTkI2pc5jeTEXZSQawiL1CTiyJfUWNTkQ1z55YgZe79p1V08npgadbeoqleoAapCsahkIf6+llMXbE5u0DWwxhPsMbMCtopE08HbsbZz2qaaaGLyriaclOS8nADj7CLBtuS6mKirC7+R37oX83p04JSIgzIkjRBqEzMtOdqKkmLMW43xGJogQpTGdm/+Gr+IPx81p99YvBvMFWvWzVnKvIRNS7zm/MWxFZXV5cBTwJ/AIwCvwPu6OjoOFtdXf0/gd2AN35Sqzo6Onqy3PUCVBB7SaAmF0W+osamIhsmk/ksJsqego0s235t1pmvbDOxAIsW+YhEMp2TcW21C1Q1BGTsry1gmHbWmsGQi1gWPbm5ZXqN19TzKDu5K+OWh1jBHewiymzL46RmVieDEJIzZwamtI+pkGrz4wp3487qs58gMYMO+rdyrufdSZ+TmjMV+Ygal/nN+bTYkcB3Ojo6qjs6Oj4CvA48WF1d7QIOAhs7OjquBJ4HHpymYyoUCoVCkRc0N49QUJDtYqukkk52ld6TUwALE8JNSXuzUTAOBkcoLHQ+p0rRzUFWWvT36lTRxYHiDVS6utEDWuvtxphliiYdYgUum+3cjCPQqPBP7mZyFsNJAawsLEQWFCRtc4gVLOAkqzhoG8BC5sA8Ebv3bShXnw+M8mJ3uNvsj0YI05opW05RlWShZOxboQBovvZ/8/7yYsrLSygvL2HhPE9aP7jZW11+GXPmlfJ0+R0smffOeesfV1waTEsQ29HREeno6Hgu4aH/AKqAjwHDHR0dL8Yf3wUo7XeFQqFQXFTU1IyzffswpaUTwlDWSG5b+htO9JTx2Y6WnHsPDeEmqqpML1C7HsaamnFaW4eTxKoS8bpHaJFfYyVH+DTPpD1fyAjBgm/wuYc/wSuPPo3mLeFJVtm+ty4qzV5Yq/5gr3uEHW1jpmdpIJApAEx/PobgECuQ6Nnn/tad9G9vQyv1I5noxdXLae16anPDzTgb2GmpDGwoV58PfMH70/pjhZSTUIUWpj/wevZymBVJYmGKS5fma/83ezv+HA03xK2tojEvd9YX8JPqJubMvZyy6gXM3tygL6YAh2M3s569dMUCSAThsIstW2apQFYx/Ugpp/XvyiuvdF155ZU/vfLKKzddeeWVNVdeeeWPUp4fvPLKK/1Z7m+BVCgUCoXiAuNg/Quyyt0tISZBk6BJN2Oy/rrfzuhxr/pAr3k80ORVH+idOKeDUlZVSSmElFVl/fIgtVKCPMgKWciwBJn052JUHqx/QX9xVZX5RBUn07YFKas4afucmzF5sOyu5M/ooJSCmOX2oEkf79geR1ZVpb/5gwdlFZ02+0vff3bb6edofE5VnJSCmKzipDzIipn6GrPiILXp5xM/6cRz9fGOdDGe9Xuu4mT8H1Xn9f0pzj9uxmzHSRk9lk/Yzg9V5/vdKC4QFsgsY85pF3aqrq5+FJgP/BXwBWBtR0fH/53w/CAQ6OjoiGSxuwWonthLAtWroMhX1NhU5CNW43LpR0d57S0/ydlHyaL3R/jFfd9P6p0Ug1FcEf0y7CTCFAhoXH/9OAfaXcRw4ybGp3iWn3ONpTrwLRy0LNO1UmEuCh3jr+oDPMsNSecs0NjATtpowLpgTEOzUXSeW+7Lqkz4/QloAAAgAElEQVTYRz8SV1bqyZXuMKdiFWmPT1UAaSqEQh7ubpAMymLzMSuRpsmpREvcxFjPbu7vWZ3zuak58+KhvLwE+2oGSRnnaGVz0piz64c/3/3jalzmN+ezJxaA6urqh4APAV/q6OjQgC70smLj+TmAlmUAq1AoFAqFIguON56wCGABBK+95Td7JzfK7RSF38AdOYeHMRpw9pQNhwXt7QXx8mBBDA/PcgN/wotpfqW13u9TQbroFOgeroaPqUHjJsm/cD1WAeyj3GXxXhK2StmXQYX797bvJZFBfKyhnUz+td7CMe5d/TsaXG14GEOg6Z+bq82yDxn072LR3FGzh3DRwvQeQidCIQ+LF1v7uho0NxclBbDGe2qK9yWb29EyCZsj/Xtuo4HGxtzUmhUXF25iDs8KermC9ew1e6kB257s89k/rrg4mbYgtrq6ugW9B/bzHR0dRpPILwFvdXX1NfH/3wD843QdU6FQKBSKS53jjSfY3P5xnPo/xdAQDTxCGxuTAtI2NuIjU3YkPTB+juvoZCEabk6ykBWlT9G/7RG+sfRpi35MySmq+KPx/898JBTysHtsbVrGRuLiR9zIocI6xzNKDSANYZmW2Fez6getoIsfc6PFe5s4kyo62Tu6hpfbX6VNuyP5c9Pu4H823hAXr/GxZN47HG88wfHGEzS0/wkRWYbRQxiJetl8Z6FjIGsEruXlJTQ0zCIcdiFlvJ9ws8fsP/QvvprjjSeIRKzPu4vKpLDcaYEiM/oCxtxyH0vKe/lJdZPqk70EMEWa5l7OenaRaaFnEB+rOISIi4P9JT9M+w16vfK89o8rLk6my2LnauDXwH8BhsrAyY6Oji9UV1f/KbrFziwmLHbOZLnrBahy4ksCVeahyFfU2FTkI4njcsm8d+iKWWcmdSQSFx7GLAWXBDEKGLfwSHWywNH3KYVg6NbbiG7dZj7z/370m6x961vEKCS1tLm6WuOFFwYzWBJlsv2R9PRMBN4//OJRvvH89XTHPWP/kh/yD3yJCHMs91FMlN2+L7M6usuh9HjCysfFOBK3zXlO7L+QEUoYIEKZ5R7tPGVDIQ9btsxiaMh+EcKwwTnECtZwwPJ7BD2T/SSrqHX9A2gaC6fVr1cyi2GKidKHnwr377l39e+4aeuSpK3UnHnhYiheG4Jhh1jBbTzGCF6yFUkrZITb2MuP3J+nOzaf+QE9gK2pGZ/BM8+MGpf5zXnziZ1BFqCC2EsCNbko8hU1NnVCIQ/BYBGnTwvmz8+Pm5JLmcRx6dwHKrmK3/AqfxS3ibG6EZUcKN3El9/+Br3SD0AZ5+i1CQJBV+wdCXyQaPN9ScrIRaFjzG5Yh0vGbI/V0zPA3LklSDk55eCiIknP3+/DF7yfo+FrLHo+JYUMcRv7+BE30kUlbjRiuKiii2DBN1i2/Vo+svEGurT0XtfE9zhOgcPnZoV94J/aE1gUOsbxpl9R1/dd26DUfG08OM2mv7WKTk6yEC1QwV3hr7KLhin74dqjcdvS3/Lt701kfNWcOf00NhbR3p5sJfU+zwD/lV31fNYk+lBPrp9ap0z08sbO7+esvj6TqHGZ35z3nliFQqFQXHx86aNd1NenlDgqy4Tzzte+2MW8ci/SIWO56P0Rft72H0iv17a/ze2SrBSHOSvn6NlVXJyj3KEfTrKu+CCRl19Nu0n1Be9HZFgcX7zYx1TWz8dGNX6w6QXc4W6bnk/BKMXs4Q6+Vfow77S1MxL4IJrw8HpgqenNe++a/6aAYdvjxCyzr5MnsSewKHSMezaOsaavNWMAC3qfYbb9rUYJcfv1+9lP3QwGsAAuHn/+KjUXzCATAaxI+nt7vIS55cXT+tm7TofNf0+un1qnV/qZXX87ZYsWqBJ0xYyhgliFQqFQ2PK3HzjKv751FanZpaEhQTCoRF/OFw1/8RqPP3+V2aeZSjFR9tY9x/P/WWh6y64rPkh6f5tkvdyNKxLhMCtYwElc8d62T/Gs5fZX8Wu2Pmwd4CbeBNuhlxFP3r9Vk242j/0d4NzzGaOAr/fdzUjNzURefpVzZ94xA+/GxiI2HPgUYxRh1/NnBPEl5Ja9sQqMC90xsycwFPKwsOHztGl3ZBVgFhMlSFPW/a2VdIHLxQPtCycdhOSGmgtmkgMHjAA2FYHEzZY7XVMOZI2ebLccZwEnOcSKKfZT62fsikSU57BixlBBrEKhUCgsKf3oIvaO12EXcJw+PflARJE9Vmq1u5/9EHblupXuMK11LyX1Ko7U3Mw3O2uoqxvD7ZaAxO2WbCjez07ZYJYOnmIBEhenWMDPuYbreBo34/r2jLOheD8/b3vJtkzQUA2+iv/EKgCeSvCaiF7qbK+EatBtcSNuZLZiscTMlkVwzy4AdrEBQXal834i7Kz7OX7RG9+nxO8bonXHKDU142b/64TwkzNul8aego2s5EjG9woTAa/QNMv3nvj+MpN9ulzNBTNHzEkgGBiKFdHUNPlFBGNMhsMu87e/nr346Z30Pss4Z/5bDA0xu/52/IuvVsGsYlpRQaxCoVDkOaGQh+pqn2nZ8f736/+1s9+YDnyNW3C/9XvHkspA6fnz/LtUSLrBjJdy19fPQnO4fJ948/KkALYodIyfVDexpLyXJ9o9fODyAdrahnnzzQF2Dq0FrEsHB/Hx31QzEvggPT0DvNkzxDc7axz73KLN9yG9Xl7ljxMCWeNv+gnS5LjvgH9CSMlYDJgozUzECGT1YN0QdQJYyRGeZDU+3nU8VgHDbK37T27auoTXzhTS0zNAT88Ar50cN/vHg8EiRwGnRLxeyY5HR1i2/VpigQqCNFMsBtOOWcbZJKsjw7PTOejNfA4lriFAy+pclX3KzOHOoqq9r09kZYeUqDxsBJVWY9KYC6yUxjP9ngsYppXNSY8JwB3uVllZxbSigliFQqHIYxobi6ivn0Vfn1GCKdA0/b+G/Ua2gWw2/pMAl33xc3jbH9NvPBz6IoOyaRLvSJEL1kGPkT1MRyQEHaGQh48tHOHy+jrW9LXSFc+ydkdmm+PGyJzalQ52UZlVibCBUbocC1Twa/F/ESudgxbvs51OjEzPSo5wHU9jdVPtZpymoD7GExcDnAI4iYtxCswA1mAlR5hDxOa1kjLOsbPu52lKvalkzljqAUIgoLFt2zA1NeNmOfQNPbt4eKcgENAQQt9mZ93PORP4GLF4v29tPICFzAG+03Ne9wh/96hg39LH8XMWp+AlsVRaMf2sXj1G5kUg3Q7peOMJ2y0M5WF3uBshJe5wN/dsHCMcth6TEcrYwzqq6ERfzDAqKazmHwnxhZR21poLKWlnOTSEL3h/hveiUGSHCmIVCoXCiYYG5swrZU75ZcyZV4qvcct7duhkQQ9rhkYLuHNjUVpAmrrifrzxRFpGL1WcyQh6Zj3/LAWMIYjhZRCrUsvreJpVbz86fW9WYUmuZZo+9ExdKORhy2YP3dE5YBFEDo0WEAwW6ZnTwkLbrF0lXWagmy2JPajS5zNHb2KJ4VQoZIRWNushlRA8FVjHuup/xUUMI9DyFY6yo20s5wzoIVakPSYB6XY79AgKfAxw88ffyLh/p4ylQFJXN0ZPzwAvvxy1VP+uqRnn5ZejnDmjb3PT1iVJ/b5aYEJteSVHHD9zv18PhAUSv+g1M7oV/n627dCoqRnnxu8t57WeWWZWeW/dc7al0oqZYevWEerqxpgIJO3GkOCO9qX88ItHLZ/1Be83rXMA3TdauwO760sF3dRyhNcDS/H7Rmy3M45dKcKcZGHSQooVuSyKKRROKIsdRV6gpM8V+YivcQv/1B6lmRa64h6UQZr4Qp0vyRdzJsgmgE1EoCER+H3DMDZG32iJeb4rOUKVOEWXTL8JN7wrnbwqdX9RPRByE2M9u9jJXcQCFURefnWqb1XhgLOfajoCjSfFatbIJzLbtsQtX4pCx/jnr/yCO6LfTSopLibKnoKNpppvLpiWTGHMcQiwlvYUP9pMnrCpSA6ykpUcyWn8ZWvpU0w0qSQXIBaoINp8Hx+pv44uG89VgcZYYEHG87H+nUn8fkkwOHXbKiufz/TPHAoKJNu3D89I8Kmu5zPHxxaOxBemnJD4fcMEHyLp+50z93KElBxiBc20cIoqnLyYyzhHa8H/Ytn2a7m8fq3DtjoCjbfb2gE9YHaFuy1fcb6uG2pc5jfKYkehUCimkX9qH2Qt7UliN2tp55/aBzO/eAqEQh6eeCL7ABaIZ9oEkaiXyOhl5vmu4hBz6KFLWmfTjFIyp0zVOAVU0p1Uaim9XqLN9+X61hRZUhQ6RtmiBSwL7ySXflI/vdTJx7KybTGygiM1N/OZkw/zcJuLCn8/Ao1KOtlVes+kA1groRiAfdRRRaeZ8WtrG6atbZhAwMgyOVPFKVZyJOfxl23P5iA+mmkx/18WFppeuN9Y+gx2PaKVdOGK+2s6UVMzzrZtw0klwW1tw7z2mnXmNVcSy7mlEKzw/YB9rKUsoSS4tFSbsQBWMbOEB8uy2Eq/DtTXz+LDFZjVNtr8QJKAW6bMai9XsH7sURo3ZffbqaQLX/B+sxKjv+0xpNebtE02v9vGxiLmzdN1H+bNK8mq11dxaaIysYq8QK2QKfKRD5cP08sVFs9IKvwDNAU9M3IjmGv2LRv0TG36PoWQ7Nw5TEPDrAyZKo2Y7zLE4CDa/IB5Y6+YfopCx5i9uQExOsoCTsZvOLNB4qOfKJdl3NJbOMa21vH3dPy6GWc/q6kVRxm69ba0aoZM1QdGlrSWI0i/n4Hgd7Ieg3YZUKtjCTS0uKCZLCigf3sbIzU3Ewp5uKvewzgFSdsXMEw7a6l1H+Pcm31Znc/FjLqezxyTuTYUE2XH0sPUrHDxkYbrLStynLC7dqQeYw/rqBVHOXfmHfPxotAxPSt7OpzVdcN6DtDL7LdunVrf9WTHZa7vQTE5JpOJVUGsIi9QF71LF7Ps8LRg/nxJc/PUS+qmi/LyEpxXqyVlIsKDt/5nRkGXXMi29DF3rG/a9SwYGW6OJLct/Q3f/t7UvAMV1iTeKOFyIeK+Gi5iWYsilXE2bj1jL/okgQpfhKaHfDP2O3Mav8VikNZbf2H+Xsz3He5mAZ10UWXxKkkVp8zSeNDLZJtooZtKAv5oVgtKxlwTDgvcxOLK2+nnWUUnnSw0/98of7QLIMo4yznKkcC5nncdz+FSQF3PZw6ntg8nXMR4bOk+bnv+duQ02VzpJP82p1oqbDd3GK0PU2Ey4zK1PB/0bHL/tkc4TC3NX4FIdBYAZa4+Hlzzf6b1XuBSQgWxigsWddG7uAiFPLQ0jxOO+PDTiwT6KMMv+pDFxfQNzmL+fMn1149z9GhB0gXZ65WmKuf5JnMQqyPQWLv0t9MW4Nmvtk/VZ9Mm8xTPxt7dIBmUxbavdjPOmz1Dts8rsicxeDM4HO9V66LS9Gh0CkoTEWg8ySpWcchme8lBcQvLdv75jGcRMmWLjD7s1BtEQQyrLqfEzChglkQm9u9mm1kOhTyO49yqJ1YKwbkz79jfYMfPT/WI66jr+cySuBjjQovbbWWeIwoZwVdaEFe6TyXX3nRj64SqBaC/7bFJzS9FoWP8892/4JbB3TbnIOnpee+DWP/iq3GntAkcYgWreBIsPvdCRni07t9VIDsJVE+sQqE475iqqJHZSFz0cgURrtD/LcuIRL2mOm57e0HaivLQkCAYzI8eGL9vOKvtJC4e///Ze//oKM4zz/dTXWqJVktgtUCE0JLAs7ki8fXcudjJnNkYnD0exznZOPaMPFxABBDGgMQPTXDWiqVxHE8i2XhjZsQvYcDIYAQM454J40zWP9YzDiS7swnh5EyuE3OzCUJqmyCklkFqCUldXfeP6ipVdVd1V7daSIL6nKNj011dXVVd9b7v877P8/2e+TRvVjRkxQOvsXGYfIx1twJRchlfAGnlNzh/vkxlZYQDsmqnYD5pmMwz1sE+avB2MngfC7mIiMQcuqnmsFZ/3cucWCq7vUGlTHI12mJ6b0oAC8r96/FYTzwHgwKBQI5BLbWdFZZnWip+ZLgjzTxtVbXlVDQ15VkEsEq9aASBOlpwIbGAi7SzQlNntqqrLaMTWRCcGnGHm4KqUN3dPcDvuwc5WP0exa4QqWrKR8hDEEh4NvOFQY6xiqNF2/CIxpRdIYVPsF7VXC7yZdS+eOu380bNu6wb3MX4JmmzT7ySshLAHgOLLI4R8vjO0U/dnINzcIJYBweH7NLUlMfQiDv1hoBVh/Xhh4KpKfvNpul7YF9Ux0VNXxN31zzA3BIv9yzCtn9rPCs5zgHWawI45XTwGqs4zHrcxAfWifY3uSa2OB6PzOrVowkDGI9H1jwel/vP0sFCS29Ya89Yh3TwNj3H8aFHNYEVNWgdZUbG+xRjg80W6sjFOBDNZZgXqn950+q4VPEiEatVUYHamjxygh1aoNhIs2XN9tN7ig3WMVZWN3bsiKy3UbwvR8inNzbppgqjPfTRq8yeO4uvdB/G7Hn7Mj8AWXbq5G5Dknlv6/uw4kULKK5YoFm1zS6ZaejXtG1LZjL7E3co/437K65YYNoPPrLjXn79ezeHlx7SefqaEwqB54Ya8Mr4fFFe2ifwxe79fOlCMzv3RDXRsTIxyCb2kU/YdF/5hDXVcdnj4cajf552n/2Dx07y6bZGVtGeov2bnGzMeHsxRfQteejUJX1yAo/IQY8TxDo4OGSVdH0tzSiVL1FYs95gyl64fetND2QrKyM88MlfY7cDDVNIZywo6QoVsr0uJ6NA1tv0HFWcoIOFRBHpYCFVnKCKExz2bqPM1aUpyG7KfYUyXbB7jCqG8XKMKi0ILhM62bX8PXbsGE5QRtWnbocbn0UWBDaw3+ScZdYuvZD2uTgk4vowaLqaOB6kWHdexQmDAnAZHbS0Rm96eltlZYQjrLEcACsB65hy8SWLwFSWlX2p9yZg6WlrR4HYrkrxGALvRv8Tm+Vd/LeRPyVx4k3gh3wF4KZ6SDtMLk8/1klJiZeaGqP3dl3dDE7Xn+OthU+ysOar5AQv4ZIlSkIfcKLvIWWqRJIQQOnXttXgrd9O4fatSn8HCNFobErF+OfqC1FYs97yPvvK68vpoSSpN7AAhORiba/xmVB6H+Jf7nmLvZ6nOICaoRONpfwrAbAnli3UzkpmD13C23aAnOAl5shXOBm8L2Wf/fRjnaw7sz5mW5Vce6KGfUnenzjCjc/S7l7DAi7iQorZEiWnVPzoJhyZAzg1sQ5TBKeG5tZh/Mq6MjXsZR9bE97JZs3Z6fpzfOfop+iSPkmp+BHPrP6N6UB/zpxCNs77R175/cNx6bT2fS1LfQM0yQ2s+nivQd3QSvVQ9fNL3JN5zdHskpkpj8butfPWb8fz6itslndxgE1IiIhIrKs4Q9PZz9o8Z4dk+BbfhTvYYVuwyQ7xYkQqk1mn6Vt8FyeD9yWp0x1DJGJqC6TWz0Ls3mw7xPFx1sRmIowjEiEa80qOR60LHE9N4K3Erd6fP/1YJ6+c+QzJBNQEIkTJNbwuMsodfEyIYoOHt+xyIUSTp+3qkQWB/n0HE+6zvMApCmvWc5wVVHPYZGXTWthPfcb0+9Jq9kURJIljRVvYGN5pyLTKZZhIrNhFTy7DHKaa5f4fm7Y/qa6hngKuc61oIb0XOrTX4gUhH7rz17zzk1lKf04nzTSwUjzF0OpqTQU91X1pOGdBAFlmM7vZT63tttqpic0cpybWwcFh0mlsHMaTOzqOPYytbMQTX59iRbIUL1A60A1t99Mp+ZFx0Sn52dB2P08/Zr7C891//1NCrUeI+MuJ4sJLOgM0ga5QIZv6XuC4vFxbVTbMvsetNsenMKlY1hxZFbvqsHvtwjt20r/vILv9LzIq5BLxlxNqPeIEsFkk3PgspRarifaIq2nTpfUZtppkL99w47Os9Hyfci6l3FZCJAdju5FPWEt1h9i92XqIFb63OMATWgZCqa/ftl2Q3qc1nRRFCdFyBbiMTtpZwUIuckdNtWmb43Dr8OqZCpIFXzKuhKAOQMJtSFXfwEHaWQFpBLAAgixT0PhUQuqut+k5BJRsjLY4b+DiJGnG8dlTas2+tjIsSeDx0Cg0J5QKjZBneq4j5NFIs6HfyQuc4s2KBhaV3LAdwILM1zjGQPOL2isGH+rYCvgrZz4z1p/Hru1xaRmetkO2MiQSzlmWOc6KtAJYkJ0A9ibjrMQ6TAlu9Znb241AIEcnPZ9+enG8GqmKnVUls5UWveJxIJBDbU2e5YrKvlajxY/+3lQHCieD97GGI0jYrf1VUH0yqziBLIqajYqeaMz/MkHWXxBMvTUhuyuxDhNPXuAUf17j512+SOrnQ/F+ncGwtoLzZX7AD/kKnZQZV3QAXC6IRon6S6eEn2Fe4BSnG37Bpr4XbKRPyxTQT5gCyujkO0Uv8aULzRN2bIFADnV1MxgZSd1Gqc9u/AqwMTCYmirrN5tbvT+3q1pvB6sMilTEr6nKHg8MDSUcVbtO8dxFNGW2A5gr8kJ6dl+g9Kej/gWEG5+loPEpToQeMnl+UlPq6+fnH4z92262l3ptZVGk53Kf5X2ZFzhF4ZaNCf1xeh7d5ivaDvZxVmIdHBymBJWVET64GKG19QY+n7riIQP2Vj/MVjza3WtYOPg+c+cWcM8ieLOiwVRAoqkpL6nicVOTeQALygy6mcJpXuAUby14krtrHsAd7KCOloyUeiVyxmbfTQJYACEUUo55eZVWAwjKzLDnZHtCjZGdOmEZHOXUKYS36Tne4wFSDYQFotSwlwFm0UOJVh+9j62GeumVnCBa5KO/9RA9v/+Ynu7rhM6/P+kBLMBw5TK+dKGZl1pdlPr6Sd4GCAyRTxSRi57P8EjzH03osVVWRmhpMbZRIsMmxyezgf1UcYIDYg1lQidj56GvWhxjKqmsO2SXbArcXaKcWnan/bmEyuyhoYSMHNWKShWPUwJYc2E/vQiVyySABdLOHikixMK+89xRU82doZ9TR0tGOgDBvgLDv+3qbmgCcCZ97en6c9w77xpzS7zcXfMAx6XEttKqTh8SVZv1AokONw8niHVwcJhQ8vOV8pJSby/HWMUxqkgeyMoJqZHtQhVPCAcV2x7ZPD1XDeasOjj19VQdYML77e28se0sGwf/RhNtUqxPMms+B/FSRwsLY0IRC7hILbs14YiFXOR0wy/Ie+ethLpYYWiIwi0bDcH76YZfGPbVzoqxQ2cFC2IWLnc1VU9oiqO3fjvPlhxhXomHkpIC5pV4aFzyswn7vumM68NgykkQkQivCavZ66pLeE92uaC4GFkQkPyl9LceovdCx5QIWq2orIxwMf+ulAItEqISkO/cfVPOp7IywgcfKHYl3d0DXK/eQg37YsrKMiIRatjHXrYi+Ut5eM/9nLtSFJtoS96WBINTyy7EITsoAnfZyg4UaGWzIZCVTf70WH6zJCkrsjHMxeMERCIIjAn7reQ4b2z5EX8QPIMoR1gY14+ofLfopQR1+1yGcTFicjAyIYrpCs/W0qcVz+v0iRdjsyvOpk2GxwX3p+vPUdf2uYT04/i+01rHPMrG/KOU+vpNBRIdbh5OOrHDlOBWTz+6HTFN6xWHORh9nAb5uzFFwkS8XGeAWYbXyrlour0+FUtNl7VKNVJTfVKlIsWnBM357N0suPReWmlFqUlIBjP8280N2lhHFSeS7qXdvYYNo3sNA5V8whzgCYBE8ZsUKY5WQlMqS/9whA9+7zP9bC5DjOBJOK8nKv7VqaeNw7f4LvKCvzNN7QNFAOZV72a++r0/BqCg8SlthV4u8jHQ/CIzNz0+7drM2XNnIcqjyEkCeNEV5fLvJyclT62L2zz0okHUbAP72et5yhBYzy3x2kitlPEJIfpkH/P9ykrNRA50G5f8jMMXlk66GNvt0J8//Vgnh88sirsHxtq+XIZxM0yYwoT3zBCJEImVp8SXfujbZVwu0zIU9XPhxmc1cSLRIv1XIMqV7rFn7C1/HRtHdlv2I2o6chF9DOPWzqnY1UdLdCtyvpe6wecJ4TN8SzYw67PMxdmMfah6/Cs5wVD1esI7dmr35b3zrtEpJepO6McT1qnEMtXVo+zY4ay6ZptM0omdINZhSnA7dHq3G8mCycbGYbZtm8HoaHxHZ65MbFWLo6+dlQWBnivXbNXEWqmTmnWYc+bOwiVHsqoka4dirtJDSdJtrDra8lj7b/aezxclPx9N1VEdWKsDeEMdrsfD0PIq8t55iz8MvsGv+EPSHZyIRLjcPZR6w9uIvMApvllzg1ZqiQ/6RST2tI6mDHamY5vpW3wXOcFLWN9DipXGbv+OSann9S2+i63Bp2hlM/G/Sw172e1/kdD598kLnOLumgcsJ+KsmMg62cYlP+Pghf9E4uQY5DBKa/VPbprgzHS8N8dLvFqufsLCXg2njIwL2eNJmoWQTLn+lep/4bl37teOoS84SJiChG299PNRq17f4ajphFoxVxki3zIF2EwRPH13AnPFZLUtlBAp9Q3Q0JSTEMhmqk5sNQGlH09Y1//KdHcPpHF+DnZxgliHacvt2OndigQCOTQ25hEKqZ1SYuckCDJXrgxQX59HW5vbZBuZYnpooU4Rq/F4KPdcoStUmLAvs5VY9TisBhT694NBQXUPwG+xUpL9lVirTjtxu2NUWa7GtrPC0rpErdex6oTNBGieaKowFfOQBQFBlmP7zGR23en0zcgLnKJ+c4SD0cczWjmbjm3m/f/hY3593U+yIFa951MN5ieC2XNn4ZZHTAf0IhFGhVx6rlxLyzoonokSf5lX4rFc2VeQKRZCvLD23yc8mJ2O9+ZEYsfWSSTCsP/OlJM3VqJLx/KfYIP8ctLVSf3rxfSyjJMcoTpJnWrqvir+fp47twBZtte/lXMp5rtqfoyGfsqmhVYysrES64g3TRxOEOswbXE6velPIJDDli0zkKTkHViZGOSXe97irqbqpDO2+aY+rGUAACAASURBVIR5Of/rfPWlP+Y4KxMGAWq60EQPeOe8/U+0V7/D10YPj3M1VgnOlbog+x6zakC/khPap9pNfDL1JFuJNcMn9FIg99MVp3arJ9Mg1lmJnRimY5tpR9XVamLqZpB8pVhW7KbOv6+thhVwjTAz0/oOdRIv29hXzFXGUvdXBPn7s3dk/Thget6bE00gkMN/2QoDETPFfpnHl/6K51+3FhJSscqYsZroTYZANEWfZmfC1ThJaXcltowOLrGQHEZTTL6MMd4AUr0v1ZpYs/Rptd8z62NvZ8Xxm4GjTuzg4HDTyAuc4q2FT3JvSS8lJV5qalIHsAIROqX5zKpZl1L0ZBAvjcPfYrhymcHbURBkSn397C/6JiuFk4qwzUSu2FRV8fCuJcg2OvNU7xcQJr1AUKCXOYrnnU50wlywY+x7vswPaKKBfOx1+CHZp4lWGfwLx43MuoozWdiPw+1Cp04R1K63cbYINz5rqTwrImkK36qP88tsQiC9Aa1dUZp0sa+Yqygp/+iCnxUl/yNB3d1hYqisjPC7jyI8UfGvjClbKxkudgNYUNS++3fuRvKXjgm77dydoOBrh2QBbD5hiulJuY94i/LGxuEE8SczReRnqi8iezxsYH/C+3b9bAOBHCoqvJSUFFBSUsCiRfb8mR/ZcS8t1T+lTAwiEKVMDBoCWFC8djUvake8acrirMQ6TAmcmdvpQyCQQ3NjhK6QF4HkHeEYMnkMMZwg/JOceAGKyWDO2//Ea1v/jdV9u0h27EVFUfr6Eq02VJRZb8h07lBfI5vKr09dzUqWcpyKcjq4KNyp1V/dxS9s1MRGcCEQxTWpwjK3A9OxzZzqK7GgCPa8cuYzxNeW6gMN/WpYOytYzyvcwHyF7WZ5x5rXxKbCXh1mukzHe3O6k34tqjWKJ/IaFGupVL6uYyuxeYFTmh9sA81aZo/e17pU/Iin9xRTWRlh5mNfJffMe2xmt0FIzcMgAyYZDj5flA8+UMYDgUCOpa7G0qUSr7+emP2T7L60StN2/NVvHs5KrIPDNKe+Po9585RZxXnzCqivn1yfQb13nG/xXZyuP8f27TNiaUsu2+m15VxKO4AFKBU/Sv+gs0he4BRs2MAzfU+S7NgFQebRRyP4/daTbTIuxDhvuXToZba2Omrmo6tH9ber4gTlXEqxZ/Nj7qSMobWPa5YN7/NHfIZ/x9wAQsbHVY6xhv7qTXR3D3C5e8gJYB0MiELy+z+fsGavJXs8k+Jt/PzrZVRXjyKKMd9YUVEj1a+U6VfDVnKCQbGQY1ThpR/1eXAh8QDvxFZ7Jn4lp+nsZ/n0jN+RifWLMDSEt+m57B+Uw03DfAU0NfF+pyAjIbKKY9TRwhraYuUp5vsuKlJezwucorCuFlcoRBUnuKTzsTb4WkfLNCHB3LM/QgD2sZUIbmRcRHCzn03kkqj+298v8NhjHubNK6CmxiyAVc7ozBkx7bFTuPFZgz0RTF4b5GAfZyXWYUownWdu6+vzOHrUjSQpqTWrV1vLrwcCOTz/l9foGi7RFPQeW3qZ66//k4XQ0eTJuaudkjAy5gFXTgedlKe5pyjHWJVkRVD/bBtrXluqf3rTFDUh7rd0yWyItrKPLSlXPkERnlheJfN3baNJZq6juBlhlBkZHV8xVykgHAtSrVd9VWXHDezn8/yPpNfepVNk1KPWH6Wy3XGYHKZjmxkI5FBTk4dx/lx5/kt9AzTJDaz6eK9zn42DsZVksFPPqLZrqrp7NpiO9+ZUI5U4odVnttS4bdeYCkTZxD7+WXyUTmk+yrMY38/J+OhhhDzz1VHvEB9cjFiuZMajrmym2r6YbkLMMXnHnjCiyyXz+98ba89T3ZdOXze5OMJODtOW6drppRN4Kr5ynzYEQ6qYwGNLL+P9yb+Y1pSKoszly/aFQDLp/Mz28fzmHjqj8xGJIsVWESVE0k1XUy1zrIWBlHSk0/Xn+M7RTykS+eJHPLP6NzctgM0LnOLPNy/k3aiZRYXaaaY+71JfP18JHTWxTlEo5irDFoMBe9hVNh7bvoa9vMI6RshPY7+OF95UZ7q2mdlonxxS89hjHs6c0U9OJaZdPsDb/He+BGQ3bXK63ptThVQ2cWaoAZg72JGGAKFMmfghT+8p5vmaKykso8z7HoEoA9Ub8bz6iqn1j2EPgkD/voMMVy6ztApSsTNxnByZ1tY4uzznvpzSOEGsw7RlOjQuai1oMOSllE6+W/QS1R//LZKcuIoFMoKANkjL+em/saHtftNGuZwOLrIQV4ogz+4xptv5qZyuP0f9q39ISFYNy8dvVp7LIBK5seDXLBi0rl+5WeQFTvHG5vdYFT3KeM9ZIEpR7gChEfMg1cv1mFF8dozg7aDUN63maxxFtjlDD46VwFRnOrSZDlODQCCHui1uRqSxvkofwDo1sVMLq/rWsli9uOzzMdD0ovZ76Wu0raxhkpEvDDIop1/uoyBTxiWacv+aVSNtKbaEnu7rgHUNqkq52GVqg5PusXm9Mt/7njJR5tyXUxsniHWYtkz1xiUQyGF7XQ5DI27tNTc3GCWPVA2/xyOTP9xHb9Rn+r4QW+F0i9Fxr8RadX6pApLT9eeobfuTjNNczVGf2fiVzTEm0ubBLr7Fd/EHwTNpdPyqCXtiQFimpVtbp/nat8FILhJjH5kyOmOqr2mIak2QFYhDdpjqbabD1GWi0yadezMzvPXb8RxtQ5RGTCe8BV3ph5ybS3/LPoYrlxkCwnZWsI42RkhXTyPT/kUhnzBraOOHfIVLlGkZXOU6yzb9ar+ZVZB2JB4Pryz/Z7ad/IJN39vkCILMvn032LTJ49yXUxhH2MnBIUvoBY3erGhgS22uIYAFYgFf6gZ1aEigN1pk+b4q0rN69ShmUvNrP/8rg7hSMjuEeAn6VK+Dcq7fabszywGsSvz3CohIRPzlXGs9POkBLCg2HpewZ2+g8qq3NsG+Jp8wX87971k6qsTfy2UpCqUXWDIneWBtzkRZgTg4OEwuw5XLCJ1/n54r1widf9+p+5sCeOu342k7xHFpmWVbrxf0E0ZGNDEuvRVVFSco5PrEHqwJg3jZT21sMtgVm+TVWba51xhEkgxWQYAsisigWQY9suNeg62e3x9l6VKJTITLZFmg8RvZOU+HqYUTxDo4xKHOEIrBLo7Ly9nU94JFyvD4EYjSRAMjS7/Ajh3DCaqYT1T8KwfP3o0Y7EKQZcRgF4Xbt1oGslaBR/zrgUAOixd7mVtSwN21D9JJadbPywoJcUoNnKLz/WmpBotIrMwNcMC9mXI6EIhSTgcH3Jv5YeQhrIPFqC3fPcvjRDDxfdWnaAuY++3ZXfkdw+NR0uAdHBwcHCYez9E2jrOCDRw0zfLRK3eruIJdzJ47CwRjGx+i2OJbkk92Jif156xqWAfx8nShMV39dP057t7yEO5gBwvELl5Z/Q493dcNY4PKygjnz4e5cmWA8+fDvP76EBUVqsdueoTCM6C2Nu3POUxtnCDWwSEOb9NzWopLHS0pPNLskhhgqMqAqjoxwI4dw1y+PMC11sO8mreRwIW7cckSAlEEosymm+NDj5raIZyuP8fHHw4mfo8gEwwKLF6sGIHX1+dRWzuDYNCFjECnXJZ2go4QM2vPzU3sFNXzEpFMPxtvjj4Z6FfahXAYKQ0hjA3sR/i4j4d3LeG3/qVIQg6/9S/l4V1L6IrOt/xcDftooc4iEE09uBCJMogHkQgQjf03caU7fWQqKqKGGW/H1N3BwcHhJiJJNNJsMd6QWUMbVZwwvCoAgiwjRI2TsNYWbPYECs2+f7w6Dl2hAkpKvMwr8VBS4uWJti/QKfmRcdEp+alr+xyn68+ZfjYvcIo3Kxr4dMkNLlxwxR1LGoH5/v1JM9kcph9OEOvgEIeamtPOCnqZncU9C4hEtJW711jFc91rtABWJS9wije2nWXd4K6YxPzYSlsvc1hHGyeDS4CxFdWSkgKeaPsCA3IB8Q28LCufDQZd1NQoasrKa/qtXJA0XTWqBU9ldHCg+kd0dw8QDA5wsPo9yoROw3nt9TzF2qUXMFsZVNKms0O8j62dDkq/0i7IMq6+EGVYCUxEcaGkMIlENKXl6Hy/aUqela+tjx72spWV4inW0Ba7lmP7lHGlWKWVtfQsiRzyGbIZeNsZeAj85jcubcb7/cY2nmiqSOuaOjg4ODiMA1GM6RaYIdDK5tj4QWIBFzXPcDOaaDCZLLXDRJaQCOjTjOP7pkG8fOfopxI+pY6HNvW9QK82HorfrzJZnyygdRHFJUe4e8tDBAL2BQ4dpjZOEOvgEEMNCEU5wgIusp5XyLaKbBSXZgC+3P9j02O4e8tDrBpts6xRHSGPBnGHJjalCDlZzbDGv5bYeei39XId/cpgsSvE4aWHiPgXMCrkEvEv4Jet7xqsbx7ZcS+/3PcOo/4FXBTuZLn/x/Tv3M3zr5clpEdn07YlPhhNlWqtol9pV2nmafKFQcNrntxRXsvfhESOZsK+j61JDdC//fn/Zlor+72l36en+zqvfr6VI1THOnIBiRyOUE0tu7lOocke5VgQndjhp5MCnQp1It/0mtasp7higRPMOjg4OEwQQ6urKbVcQQUlx0lErTNdR5tlIFvFCQ7wBOV0YDcwtdZcuHl0SZ9MeM3b9ByNo99OkREnIMsCbjcW6cYyUURt1Xfz5hlOIHuL4KgTO0wJJlvN0MyaJhspNPEUc5UCwnRSRqnrQ55Z87+1gND8GMwRiFLq+pDOaHZrWfMJc8C9mYd3LZkSNavJVDStJPpT+R1a+dO1s5Jv+l9L8K+c8/Y/IX3zaVtKnr7Fd3EyeB+NNNNJGWUxZcbl/h8TOv8+9867ZmobIBKxMKhXBxZm843RWJK6/VToZKrJ3d0DSW0Psm3D4TA+JrvNdHCwwrk3FdJVgf7hkhZqL3zddglTMVfpoSTpNtb+7Hax6jeSvW7fW11PmRjk3OVZhtdmz52FKEds93NFRVFefPRHfKdtIV2UxcLZxM96c0e4GHR0H6YSjjqxg0OGNDXlmQSP2Q1gRUbpZyaXWKDMCEZLDXUg5sdgThmdSeovM8dMgGEyyAucorhiAYU16w2rgm9sO8s9i2Du3ALuDJ4xnYnWKzXqiV9pj//scv9Zg4iEVhNaVWVbydP1YZAqTtDBQm3FvYoT2jGZzTQDMR9d0z1idR+KRJGzdI96vbGuXnft2lnBAi7iiqWvWdViOzg4ODgYSTdTKC9witW/+2sO8ESs3CQ1vcw2tNFqn6aGkZK/FB+94zwTE9s/InyGf8dsxfPxpb/imHtdBt8S5ZnVv0l4PTrfn6TGN5G+PoFHdtzLr6ubkAQl48mM8IjbyS66BXCCWAcHklvQZI6xgZcQE7zb9HUgwaC9Y8hlmCYa0mrY0yHYV2D5XnwN6un6c5rK8b3zrvF2yaaM6ig1teS5BdyzCN7YdhZXX8jQ/bSzgg2je+kKFSLLAp2qdH9cMBqdn7jSqa5yK2JWOtl/tdNPkiKcDmbfrb4eCOQgZFRzZK46rKYk20VJZE+sRxYExQxePU6IXWsOahMu6vVSa7EdHBwcHKwxK1sRhoYorFmPt357wvYFDU8hjI5SxQmOsNp2TWt8G93OChAETel3R/UvcXMjK+ekIuHif7MIMw/4fzz7CdaNvkw6fZNAlHVLf20oU1IJNz5Lk/vbFoKI1oR37KR/38Gk2ziTstMfJ4h1cCB7nphK6o5ak2FWj5pIl/RJ3i7ZhHWjPFaj6kJiBDeNNPNlfpCheIN+v4lYXYv4meWTwfuoa/vcmMqx5GcDBzgZvM9WbaqKIcCUBbpChWwY3ZsQnJopNw7ipZHmsTOyCEbNVrkH8dJAs+ZLl43V53Djs8gej+E12eOh7cEj1NXN0IzqEzELVI3vq6Jg5qrEhm80fTWKiyOsoZirqPeTzzvEvn1jSsTDDz6EjPW1XsUx5s0roGSckxYODg4OtzJWGUEC4Gk7lBDICn0h7f/VmtaxttqKxD6tkWbkojFf+kd23MvBiu9RFrODK+bquINaQRASJuTV9OGQXGzyXjyyJm7o90fZ1zrM86+bi1oNVy7j4V1L2F/0Te0cSn39eC0yrn0+2fBZl0WU4yKKy6J0xmH64NTEOkwJJruGZnw1sUptYqnrQ748412ODv5FWrY85bHUf8Uk3AyZGvZyhOq4/eqfidTH6eYGM+knRDGldPHg0gGO/+wzhnP2eGRLe5X4eskFXDQ9ZpEIUVyU0sUz1b/TZldP15/jO0c/RZf0SUrppDnmebeGo6b1oOV00MFC7d8uJNO6GIEokpCTtOZo7tyCBEVmUFYhr1wZSHhdT7r3ploHdTK4hDphF72yTzvS5Ki1S9Y+s93dYctzSYUoSERlAR+9yIKLkOxDFEGSwO9XaoCfaKpADHZZXutErJUgo7go9fbS8D2vY9czAUx2m+ngYIVzb1prNqjIokjP5T7t37NLZpq2/O2soI6WmFOCPc9vEQkJUWvXKysjeOu34znaBpLEcVawinab+zP/jvGUW4mizMyZMh9/bNSfSIdAIIe6uhmMjIwdR26uTEuLcfxSX684MsSvGtewl73iXxp+A4fJxamJdXDIkMrKCDt33qBMDGpWMTXstVGbInGMVUr9Y7SMHw5+Ia0AVk0NtpbWBxA4wCaT/QokD3pUZLxcp1ANYMWPeKb6dzz/ehk7d95Q/EGRKRODHBxaxRNNFaYra/Ezy1bHrKr5dlJObduf8OmSYUpKvGxou3/MF44FVHOYtRyxEDSCS3H7t0qfnu8nZb2q1epytlbg9QxXLuNg4wXWiUfplYuxK3CRqnZJVSPO7JhlJFlRZ+xlDqHYcUnSmP3S9rocLV3Yfqq6YPqnrDgLdIVnU1fj4gePnczgmB1SkYnFlIODw8QTbnw2ecKrZPRRl30+082qOEEPJTzA29jzRBW0UpNg0MWTtTKn688R3rGTnst99HRf57GllynjksXnJ37BSJIE+vqUzKtg0MW2GlfaWT2VlRFaWm4Y/M3jA1iAHTuGdWM5o1Ve/G+QjEAgh4UlAiUlSibSJ0ryeeY/vJHOaTtMAE4Q6+AQo7Iywi/3vIUk5NDBQvax1UZtytgjJGAd2FlRyHWqOJEyaLAW/knOmA+pSIg5msT8tpNfIBDIobIywvuNbUgeL5ekUqo4rtmqzC6ZaehU4us97QQ6o8yIzSC7Elb2RplBBHeSTxsDPzPvu3xhkAcfjGj1tIsXe02l8xsbh/F4jJ2zx6PMAE8ETU15jEj2f7N8BlW3O8ttJEQCgRzlXHLT8dq1N2s+NOKmwfUCMB6fwURGyOO/nHnUtA7MITn6WnH13lYDVwSBwtonUgrHOIGug8PNZ7hymWVgasZA04vIubmm79Wym3f5Ipko/g7K+XynbSGz587S2uDrr/8T3176jqkdXA17KY+l7Sr2bolk245nlDzq+FvbNnkqlZURW/7me3zPEsFtsMpTsdMvBQI5bK5xE6YA/UTty9dXOIHsJOOkEztMCaZS+pG3fjueV1/RbFjaU6TeqGmvynbHSGduSCBKFJF2VlDNYUtvWIFoGlYqeuRYamdiQOX3Rzl/Ppw67SlmrQJQuH2rJlZRy25a2Uy2VZx132w4Z1kQOC4v1+xrSoUgDy7pt50SHQjk0NSUl2Chk4pM7k37Kb8yPu8NdoUft3XvqOcG8PzmHjqj823Y7NhP/RKIInkKEIaGaGcFjTRziXLbn092DFHR7aRuWWB2bwIJJQ4CMjJQziWaaKCKEwn70ltMqXXseoEZxyrJYaKZSv35ZJIXOEVhzXpLI5qe7uu2ts9h1DJjyR5j42dRiLJ6rcSOHcNj7U5QiJX4PK21KVZjklyGeZyDtFKLdX+VSbrxWH8vA1F/aUpLIhWzdq7dvYanC3fTFSqIpVe7KI5lO/WiZkiBiMTapRcMNbnx1kgLB9+nK2Tm464sFFzuNgp46VO3EUWGVlcT3rEznYtxW5JJOrETxDpMCaZap6dvxOSiInJDVyw7ETUQtaoRTYa+7rOdFWxkP2EKSewAsu9Zq9aDWvmm6lEHxvrrssDVaep5mi1cSNo1lz0eRj77x+T+5KyhY6h452WCwcSOVA3QVeJ/T2QQPu6z5d2X7r0ZCOSwZcuMWKquGcq1LiqSaW5WAulnS47GBgWpf2P13NSO+/jQo6zjECN4TD4v4YIkglJGyujgl63vKtcq2IXAmFJxOmnyichEcSUM2m4XrILUpqY8gkEBQcAw6SEIMvn5EA5b3w8CUTaxz7CqAMpkT8+Va0DmXsoODuNhqvXnk0nxogW4QqGE182ewUAgh+e39NIlfVLzGK/iRBa8XuNR+qBiemihjhW+t7jxyJ8bJu+txjOqN62AhHkQq6Y8pzvpLidMxtqdcItv59Lvs2Sqq0fZsWPYNCBOrhGheKyreOu342k7pPWdY/XM4HJBNIqhVjmeQCCHb3xjBmGTRKhiIcR39+XfshoTTk2sg0OWGK5cpnmD9n7QwepqVXE4ETWtNt1UYoEoTTFxI3XPxVitVKWue023lqWoSNk+Ot+f4AkarwxsprRo5Xlqn+THKxClnRXKVkND5J79EYIkKck8koTnZDsfWtgSBYOCIf1Sr6rsCoUU+54U3n15gVO8WdHAAqGDuSVe7lmEaaqyuq1v8V28XbKJJ2tlywDWzQ0OVr9Hd/cAFy4oXrSn68+xnxrsDlLUc7ujdh3lnitEi4o5zBOxQYUeRc3abgCbT5jvFr2k3fv9rYeQPR5NKVMRIEv/PgNlsISYWUr8dCdefTsYdFFTM4OamhmxCRghYdVelgXTQYxhG1zsp9bUYkq9H63UN/XPs1nKsoODQ3YYaHrRVLE+XkVfbSdU3Qi9ZY5okdabOUpKbC9zWMsRToQewtN+lJEl9yMLSltkNZ4JUQygrWrGU0wP5RnY/xXTk3iUQ0O2bHDixydm6vrJETh6VCltMrNGSlY6Ff/beI62aQFsNYfpZQ5aCnLUqEHxLf/fcW9JL3NLvNw7t49v+f+OrTU5scnLxL9euZi6GpfTRutwglgHBxvs2DHM0qUS8QP4fMJaIJqOb6u6iqKm7hyPzRx2ZpS6qa7Spvc5dfG17cEjpp6g6uC4lt245RFKSgrw1axha/ApBFnO0ETdvqKyFLMS0ppw3WpxOytYOPSrJOFUrKPYPoPTDb9I6JQMW+o6ytP15/j0J0YpKSlgVs06Vve1aNelK1TI9rqchA5EHyQ30sSgnG963j6hl33V/9PghZcXOMV32u5MO1Vcb0f0+PW/YZvvNeSETAGBKIk1Vi4kfDHrBkXsQhEyO+DezCPNf6RtN1y5jP6du5H8pawUTvJb/1KutR7mYPV7sd9eNvkzI0oLdQytrk7rHG8VzOyd7D2vqZ9nGRd1tBgmoRb2neeNbWeVSRuLz6n17WYB9vbtMzIaJNkJhp2A2eF2Q9+OyoJgaemWzAZuA/uZKMGlCG5W0c7CkQt8/UwlC4RLuJAs617L6KSdFfQzM+E9NzdooY4mGsglmd6E8VxyGaaFOtMtrayK9MTrdaS7oABjGk9m39dEAy5GTD4ls35m3AR4bEeNNFuWh4GiQbF/5HE6Y+OLTrmM/SOPp9AJUTQmmppSWRjdPjjpxA5TgumSfqRPC/QXDfB8/1aqRo8A1jUkRmQthUdfz5ZJKvJ4UdOJFy/2mqbkltPBl/mBSd2rzAO8zVm+YMMPbrzIHKPKcK3STRUqo4NLsZpltZ5Wn6oFSgrmK2vfZXPbf0x5TvGpyvpUJksbIAsrH9/iu3AHOyyDWKUG0o7tUzrp5lGiYi79e1421P3YrT8yIy9witMNv2BT3wuG30WZrGnle9W/uK1qgvTp66IcybCe3S5q3zj2++cT5gBPmNbMyrm59LfsY7hymeWzH3+Pq1ilRTc05NHXZwzM1frtlRzndMMv+HrftxJsQpJZejlMX6ZLfz6VsLSBI8prrGI9r3BDG1uMtw+wIn4fxn/nM8gaDnOATablVcX00MMcwLo8Kt91g/9njYt33snRxlFNcgOr+vaYHr2d0of4FOBMxlOiKHP58oBlCYbxfBRxq4208kKrx9Bvzp5XhCBJadjUpY8da8DpiFMT6zBtma6dXl7gFIVbNiLEZt9Sebq5kDjK1xIGl5k3eJl3XOpANVnn6SJqUQuczRrdxEG4nvgBebodlDoIiA989fuVgQV0xFbCU+xP14HEC3FYHZtVUDB77iwWyr8z/YxAlLXVEa2znz9fJhi0WsGz/3uIRLheXTMhQWWm4lm3EtkYUGUDtXbNbPLmz6q9hHfsTMs/ORDIYXtdDkMj+pWC5M+uMuH1DmdYajm5F/9sOKIo05/p2p9PJlYTSqmfMf122RdZFEWZaFSxdns4uM/Er34MAZlRf7lhYvQ4K231CeMVodNPHB67YzMbwzvj2qpkKDWxf/u51yhoeAqhL2T7SsYH2WpN7MIJbPetxhPTHacm1sHhJjNcuYz+PS8ju5XGUvV0O0aVqUVJFJF1tCXUsdlNRc5lmGKuIhCljA7TOhIjMnk5Edxua3sZSw9Vf+bWPungo8fyeoGSUtVIs5YyeclGoKlHRmA1ryV0vIN42ch+QOn6uyi1tT/1emmdru49M2uaZFY+0fl+088IRNmYf5QdO4Y1C4FUnVbi9TP/XSXECQsK9JYH58+Hb4sANj5FNj59XSk3uPmTsL3MppbdpqUC/9g2SF7gFP4i89l8s9ebGyMmg8JUadEC7/Jg0uyUDz8c+7wmiqKvfW87lNQGw0lRdpgsTtefY9HcEc07dNFCpdzE7J6sr8/jE58o0LZdWCIYvFEbG4fJFwZNvsVO6UF2LW/0SBJae/7PfCVpBtR8v6xpiai+7Xb7BKu06+OsZNEir3bdKirMn3G9jsmXLjSzsyWC369omYhispIX5bWjR3KYVVPNnX0/53jc+CwZrmCXGmzF0wAAIABJREFUwdLnOCuZQ3dsnJKZp3sychmeMGvA6YgTxDo4jJPhymX072olWuTTmskVvrdoqf4pLldigzRCHtu8ryiNdey1JhpiCoTWCER5nENc8d/Dx61tnOsu5oXq/5cczDxDZbz0c7D6Pbo+GmLXLqMpuD6FL5mHqrUWT/ZmfAWXqIkHWTXglyjXBuPpf7dgKW4UppACriWtAdLjyR3VOhAzAQj1PMqETgQSr3XC9zc+y0r365pwkhCrT31NXMuOlxLFPGZww3Q/BfQn7MNqgsPvd7JasoVZTemmvhcMk1RVnCAXq5rsifwtBA6wyXTyppEmvE3P0SSbT6D85xuBhL0FQ5mqUyd/XvUBsyqKEv9pz9E208+aXf8na2VDcODgYJf6+jzmzRsLNPV//hKRT5fcoKSkgHklHkpKvDzRdj8hWbVrEQiFPWzZnMe2bcZ7csuWGbS1uTVhHxAIU8AqjvFQ8CB31z5Ibc0MPHKY4pheQXq4mCirO4ExQcOuJLWmubmJk7Wn689x77xrinDRvGucrj+X9Lv0gWjo/PscZyV1dTMIhdTzE+jrU4TxFi3M4c2KhgRfWFXUbmOtlw4WEK7ewMjM2ZRxCfNrpOxXirogThOklt3kMIpAlBxGqWW34ZPtrGAhF3EHO7i79kEal/yMzW3/0SDmpKCMDBXf3eS/rZd+CrjOWNA99lcs9NLSGr0tJoft4qQTO0wJbtX0o5IS1Rw7HkWWPS9wisJtNQijo7Z8V83SSE7Xn+Prr36WAVkZZArIrK2OsGOH/dk6qzTQ+vo82trcSY9pvAjISB4vwtBQktTLiUmVsvc9SgdS6gvT0JSjdSBW1kQy0N96yHZ9aV7glJbCBCD7fAw0vchw5TLD71KUf4NQeIbJ8RFTJhYMtb5mtcNODWJ2SVZPrlpnZeanbO3vnB7mz41AVMuy2Mxu9lNrKGfIJ8xLrS7DfXJvSS+dWU+PkzlatI0vXWgGYHbJTMtkeTN7JqvrX8xVCgjTSRn+uOdWrd/+q74n6TJ532H8TPX+XL0H/rLvW4Ri9idj3Ix+Rk983WmYQTxMzhqTeXtR6uvn5x/APQuH6QrHXy+FoqIoFy6MjU2efqyTw2c+ndCutFT/1CBumAzrFGsFvc2Y7PEwtLwKz8l2g895J2X46LUs8bLCy3UTu0OZGvayj62m/atA1LIsTO0TZtMdC3ITcTNMG+tY7j87Lo2K6YpTE+swbZnqnV6mpApiwVjLUes5zMuDa+LEfMaYjIJ+e4GsIlilNxG322F4xUE+2nMCb9NznAzex9c4NsFCOMlROiJjCldurkxLizH4m2gPTnWVKVHZNjn6Wl99R14qfsTTe4qdwXoWsaopVZLY5YwGT6AMeJpoGLc/r0jEtKZdH2TbreN+s6KB1X27yOYgv5irXBXmar62qihKPLIo0nM50X4s2fW3Epl6Y9tZNozuNU7u5I6ysyVi+Wwkm2hySGQq9+d5gVO8se0s1aP7U4gwTh5Wz+3Ef6eI1aTXle5w0jZAPzYJBHKorckz7cfLxCDnLs+ydUzWz7fx2F5jlaJtIYoIkkStycRc+pgH9CIRIriTBqNWxxlFTKJ/ItPaentPMDtBrMO0ZSp3euOhosJLX19igxU/a6knEMhhy5YZpl6jk1XQn2pGVD8oBnVl0F4H4kKir/VVbUBoHfjfLMw7L58vygcfjF378QpRpCLVNU9GGZcMgUk2j8thjPH8RtYofZ1AlDxGdIqkKtZBc/yqxxraEoRY4oXSrJ9VmWuth7UJNvmOIub0fUAo6cDNfsaEmxuxVYcfa5M+Wk1s3B6Hqteb1nGnc/1FUSYqySnE6hQ7Y0lS0u4bG4dZyXEtW8awtU7l2cHIVO7PfYvv4g+CZyZFbM0+UTKxzRsf1gJSZXRwrruY2XNnMUe+Yhq86ccmyZ5LNSC2g93nWx1/yCh2hdmZCLd2AjhGFatot3g/+TGWc9E0o+VWFWtKB0fYycFhitHcPJwgquR2yzQ3W6f6VlZG2LPnhmWd6mTQ2Jh4HipubmheuSrpmJ1HcRkMza1rNid3IisUErhn0Vh9kF3/v0zRC96kSxdlE3ZcDmNYi7GMB2XwKiNyAw9jg1nrQa2AzGusMtREr6EtJsTi0fyAy+hIsN4RLWrBRZes+R8LsoyrL8Qu13ZE0xp8/bHbQWY9h1jJCYYffEh7NbxjJyNLv4CMUm9WzkVEJCreedlUzMWsnt9S0EwSYlVpVitcsdo4SfmvwWd6NPGchZERQ7vlMD1wfRjMyEfULqIrmtBXjgkL2aOcThuijUbcbhmvVkuZCWr7Yvx8PmG+W/QSoAgRtlCXUEefLwxqY5NAICemom9OqfiR7SNqbBwmNzf1+Wi/pyDQSPOEZnKJSDTSTDJF9njUMVK0yMcz1Ren1NhuuuMEsQ4OE0hlZSRBVGnXrtQpI5WVEXbutBZjutmo51FUpKj9qX8+rtLGujG/1dxcZLfbVHHXqnMVkQwG42YDU2Vf2Q5i5QThLSuFZAWBrlAh2+tyDIFsvBpjtrBSjbb1WROVSIfsU1kZ4YA8JqiV/j2a3uDWCr8vzErP9+lgIVFEmmjgCNWxGX8lcMtniP/MD2igGRcSC7hIOyuQLIYBUlRg89CLhteqosc4wppxDpYBBH7IVxTRppPtBlGW3J/9L47H6s06Y6rKakAZH8ga2klkyoT0B//JGBoS+Mu+b7GAi9o1q2W39u87g2d4s6LBEZCaRkTn+227AdjcI1p/6B1iz97hhD5/z54beL32JmfzCdNEA39T9NcmEzRmnx0bV1zz/58coyphn+mhBLLqZNgB92Yeaf4jICZE6Pm+QUSwTOikZe3/orIyopXAWE+2RXlm9W9sH0llZYSWlhv4fMnb1jI6kXNzOS4vT9u9wJrEgB5k1i69kGQSRKmZLRZ60Y+RDudv4+HWB+i90MEjO+6dUmO76Y6TTuwwJZjK6UcO1ujreVVfOCBW37qEBnEHXdJ85vtl/sD7IT+64MdMKOFPii5QRwt9fcp7Xi/kDYbok4s0saJGmrOYAiZzsPo98r7/D3y971uxukXFrP0GeYSZmfTTNyP1Zzw1sfGiPA4Th742Or06qeyIlenrPdVncYGrk07Jn7BtvPBIPmE8DCY55jEhk/ijHq//raATrlLryNVrma7fsooq2LOp74Vx1RIbif+dEoV4DvAEK4reYqDZqZOFqd2fZ14Tqyj+hylERELCRRmdNAt/xZ+t9aS0LTNrzwVBZun/EeTi/yfTJfu1vm6l5/uavUxzY4SukHWJjSd3lJe921n18V7kO4oQwgMsHLkwzr5SJooLuchnuKcDgRyaGyMEQ15K6eS7RS/xSPMfae+nSiNet/TXPP96ZqvgVvsWiHLUuwlGRtk4uieLzz3EB7EVFVHOnh20FLgq5ipX/PdkRRPjdsSpiXWYtkzlTs8hezQu+RmHLyxFQkREYgP7+bz7HNXRQ4xKxlWWXFHisOtxqkaPAJiqAWaO0kkfN9mnmxtEEZGwNkq/WQJbdtWJx1CCjue610z4sTko5AVOUVizHgEo4FrKCZDsICMIAvPnRzUlcT12BFFUirnKEPlJniul3lYbYAsnwe3m+EjluJ5HfR29LAj0XLmmKX5biZ/Yfe4CgRy21LhTiONkT/FcOV6BUiHIM2t/a1t99VZlqvfnVurEuQwRIY8oLlxE8RBmkALm+zF9ztLFygXAbDJ4uHKZFnCvG32ZEfIs92t4ltxu2nPXUh3eSyShD7N7zyv19XotBbMgPF7tPpnQ2nhFi0wnAXSBcSr1dDPBxtQkTmBVV4/yuc9JbN/iYkga+03yCXPAvZmHdy1xJrIyxAliHaYtU73Tc8ge8R32wsH36QoVmm5b6uvnYv5dSrqxLHM8prZ7iTJEzSYk/YFoMVfpocRyxUe16FBSk6aOwNbTj3XyypnPmB6TiirC4XDz8NZv5x/bwmmLfWSGMpA6fDjXss1MT3AqyjFW2Tp2ZSCopO810wjINPB8LL3O/nnHi0tlayVWz9wSr2VtnCp6lVrBNP1AVzm3DbeUTcZfLPk4lkWjcH9FkL8/e4fl9tOtP7cKIicbuyJU+qwGUJ6ng40X+MY3ZhCOPS6CAEuWSPzsZ2LK7J4CrtPPLG1fofPvW7YpdgSdyujgd/6l476uVpMAACUlXpJVSApEKaMzjRVqC3ViUeby5YGUq9IO6eMEsQ7TlunW6Tlkj2SrRvqVFzPlUiulv1TkMsgwXusVn9igwNRrNYUdx0SjWB7lYNVhp6P+6JA9knkopoe1KqaLKBvZz27/DsQXnufqF79quodAIIftdTkMjbgNn09mGZHDaFq2HmogCqQRAAuUCV00y98cq6PXrfaoit/Hhx4dl8dxXuAUC2u+aqqkLBLhCKup4gR/ypu8yxdNjl0/3kh/UkJTSxUEhtY+njLVdCqz9A9H+OD3PuJXpJIFsk5/nh1mz52FKEdSChXFuwOomQ1mxGf3fBzONQTAOYzyKmvGns/Yvqz66XhrnfjVUv2EVboq+XYnF5JZ+qjYsy2zVmjWb6PaIzpkF0ed2MHBYdqRTMBIfS8QyKHinZcRY2IT7axAFkW+vfSdBPELt1tmBkMkE80ZwQNgKfChvl7FCYOIRamvf1IDWIAdO4ZpbR2OWaMkUppV0RIHuwTDPot30hFvsl75U61h9rFFqcHdsMFSUKiyMsLOlgil3h5NoMVStVebrNmfxnHCIF4aabah1KkKnPTS2nqDX+57h+X+H5sqZ6uK38v9P+YAGygTgwikL35yuuEXDJimdcvcgeI3284K/if3WRx7cjXoVKjCL4Is43n1lWkr/BQI5JgEsACCYWXWYWKwI0KlCkHFf86KysoI58+HuXJlgA8uRtjbOhITRlPaCX0Aq9+XVT+tf90otBYThtLbeQ0N2Vb0Vie0VHV0MdhFYV0txRULeLtkE/fOu8bckgIWL/bS0JA8gAWZfrx8jWN4GKSYq0BUJ8YnIyDhpZ9Uz7woJn3b4SbjrMROY5KlVkw3nJnb25dAIIdt22YwOmrsPHJzZVpabgAkrcWJfw4euvPXnDizIEWtnlLzY7XSelB+QqvFhanps/p2ycaEY1dnvb/Y/fIkHtntiVVNVgHXGaCA8c8ZR5ExjqDUVL9kqKsZdwbPWGYtCETZxD4ADrAplqavvJMcOysXY+QTZn/RNyc87S5VfZyqvhp/PbNF/MqYnd9pKpI8Ld16Rcrpz7ODWhO7YXSvoZ1X0/rLY3Xq+qAz074qle+5nZpYPWp9ezztrOSb/tcsx615gVMUNDyF0BfSWpX2WBlRJ2X46KWfmXE1wqnS/o3vu7mBjJhBzbBSyrFjh2OHMxE4K7G3MIFADosXe5k7V5l5qq/PY/v2GQSDLmRZsLQgcHCY6pjZ9/h8UVpalM6xqSkvoYZnaEigqSlP+7w6s/x+YxvvnC1MKTbjinljqiutZa6usRWflggP71qi+axSXj7lAliAP6v2GlaJ1VnvP6vOpjqjg12+W/RSon8i4ZiYyPi7WjPvZb01lRWqDdTTrXMtLTtkXLSyGYAIbmRc1LA3tlKRjPRWKwfx8kzfkxRu3zqhq5NdKX1AhQkLYM1WxtTfKb4fDwRyTF+bKiTz+3SYeIYrl/HwriXsL/omZaqlDZd4jVXIiHSwkJXu14n6fOP2BE/le56u7Z/ZanA7K9ggHLQct+YFTlG4rQZXXAC7gYNcitlt9TLHROQq1X1qfH+UGSYBbOr95CDxQFs1vsV3TdvsilsNZyV2GhAI5FBXN4OREWuZf5UyVxcdUaUDj5dHn8o4M7cOVtipxVHxLb6LnGAHyYOGMdsQWRTp3/Ny0mdkKt+b3vrteI62gSSBKDK0unpa199NZ9RVk8bRb9NJmaLmK36LVdIRxiv2lMswh6k2rLhA+it8gUAONTXJFK5liulhGX/HEaqzbFehoNabT+Tq5D2LsBSLmxiUyTezlTEYE9qJX8lycwMBwTAoT6f2d6IpKbGydpF5wPWvnPj9Z00/N5XbzOnOVBWhisdsZbdcuESnnDjBpIpD6e3KatnNy9QQxcXEi+XpSb4aq9W7A0PV653+Nos4wk7TlFSN0qJFXkIhuzP5cXYInJgWD5rT6TlYYUcVUWX23Fm45ZGkAjU5jDBKHrIg0L/vYMoBgHNvOtglvi0XwmHEvh4yG4QpfV4xPbRQx0pOGPeSn8/1l3alPYBNnWqrfvfEDBz1okdW4jPjxVzYamKJTyEGZSCeXnq2wmSpn8PYPXwyeF8SwS6ZUt8ADU05psG202Y6gK49DHaBKCJKI0mts9QU5Fp2xzJD0m2D0itvMCOVM4FeCVoG+lsPTclJhOmIk048RUmWLpQXOEVhXW1C8foPHjsZK1z3Egql80AKyLi4xAI2cJDjrMi6sER9fR7z5hVQUlLAvHkF1Ndb+5c5OIyXxsbhhDRIj0eppYknOt+PlKJZi+Cmlt0MrX3c6Xwcsoqauttz5Rqh8+8jfNxHMT2W27sYwVpMSaCcS/RQoih7+nyGVD8OHMjo/v1u0UtgK014IpDpxYcLiQWuzglLndWErXz9qPWvVohESEfQyorOuBRmdSCuTKill3YdDAqTklasF9NJLtgl0BUq5MlambdLNjJ7XhGzS2Y6aZYOBoYrlxFufBY8HgRJshSpuuMO5flTU5APsInMVMEvUcNeMn2e3dyghTo6WEg5l0y30Z+DALaFqhwmBieInWACgRy2bXEbagC2bXFrHVRB41MIIyOGzxwfqWTLmZV0Sv7YrFVmAwpVPVKQ5aw9aIq9hxtJUjplSRJoa3M7gazDhJFOLU648VnKhFR1ggIHXLVTPjvBYfoTne+nhTpyiZ9wkfFynZerf5J00KUGRnJuLgNNLxoCZKqqMjqm4Uf/HNFC2XriERhgJjIuOiU/NTUzqKiYmDrQysoIP/8ArrUe5khRXUyR1Hid8wlzhNWWA9Z0iB+gv0wNmU8GCNTUzOATc722+tZM62oDgRwqKryUlCiT0nfWfJXjQ4/SzorYSlRyBuV8ZYwhSQigTMJv3wrt7ba+3+HWx9v0nJZS3EQDbm4kbBPuG+Xtko0Ig2Fkl0uXuZAOMpcoZz+1JFdLN3+9mKu0sU4rBWiiwVTjwKre3WFycILYCabxGzAqGTuUUSmHxm8o/y+EQrSzggVcVGanuUgdLVmrRVIHQfoHLS9wCt/iu5g9d5Y2c2r2moq3frs203q0zSyoFjh69OalbTncfujFm86fD1vWiw1XLuOZtb8lXxhMuj8p6giWOEw84cZnWen5Poep1gS4yujgtfyNfNT69zyy4152+1+0XK3VAqOREbxNz2Vlleu5d+5HMhU1mQwE+vpc1NbOMARr+qDsnkXwZkWDad9kh+HKZXzpQjO/7p5Ba+vYZFipr58D7s1UccJ0wKoMtu1bIyXYnIx7eCUQlV20tbl5+jFrmxVVNVY/Uf5krczp+nNJ966qwvf1qX26QC9zWMsRqjmM3QA8fgVaGBqCxkZbn3W49dGPPas4wUwS08xHyKOOFu4M/RwxOprBt6jlD4Kl1U6+MGjZzpZziauxjBf9sR7giZjNV6JdkEoyO6OJwGqsPpUF4iYSpyZ2gkkmjHCt9TBv1LxrYr6cvXok1dh9uf/HhM6/byi2V2XLL1GGSBQJlyZMsdLzffp37ibnp/+Gp+2QdjRKWpb5+YzHANqpoXHIJoFADltq3Ja1saIoc/myvfvVuTcdxkMqzYO8wCnTfkC1S7Kyz8j0vrQSSpt8ZB5f+ivuXXFnggCS/lpk0+5KX7N3XGfjoWpKrOIY9ub6Zcq5pFmAAPQym2z246HWI6bnbKUZUEwP+b48gn0F3HGHjCBAX59AUZHMjRswOJi5D64es1pgBIGrE1Tv7DC90Is1AbiQLALNTMa9dmtgZQ5Wv4en7RAbOGDazq7khJL2bGIxBFC4ZSOCZMxgkd1u+ne13rSyJCsbpFeW/zPbTn7Btv3RVMURdpqCJAtiJd9s7gz9nEspRTbGi2o4D2V00czTAFRzmFFmJGytPtTL/T/Gdfkjw4Obw6hpYJBOUGCGEyg4ZJvT9efY2PZ5ouTGvZOe15tzbxq5lfyppwrFCz/JifDDCUFU/Kw/jCkSZ3pfWgU9oiuKFBUQYh7Kk4P1oLSYq/RQAsBrRVto8LZk7R6MH2irlNNBp4202okUwlL3H/GXm6o5W09KTPQxmU+0AFBeztWf/XJCv9theqAGXseHHo0tmpgLJmWGvXtcFUqbUzLT4Dmrb2clfynhxmctJxzzAqeU8r9QSPnmmPsHwOmGX/D1vm/FJq6gqEimuTl7/aJ+si3+bNtZwRqOmo7LJ1MgLhMcYacpiBfzG8jLAEIolJCKMzGoPoUuOinnaxxjHYdMA1gYq6V1fRhUrDt0bGA/iSlWMqtXZ5IC4uAwcTyy415erv4JXhRxF5BxCVHHrDxN9GlKixZ52bYtzp+6Lmdc6Z4OMPC9v2VlboAOFhKN+T+aBbAw/hosK6G0PXuH6e4e4ED1j7QUOp/Qi887hH4idAzltTF/Z3OU7J0oxVw1qQ1O3NpqUNrLbNpZQTsrWN/3PcM9WFeX2iM9EMhh0aKx+k99LW648Vlkj8d4dh4Pz1T/Dk+unb4t00G5vcl5EQlXsMs0dTD7x2QXmTW0JdynsscDTU0T/N0O04XhymW8svyf2SAcii3YWE24ZIKde1yZ5MoLnAJRpIoTCe2sLAhawKrXHdCvsA5XLqP3gw56uq/T032d3gsdALyx7Szr+75HL3NQ26++PhfbtqVuk+zgrd9OYc16TgbvYw7dCER1fxJrOWKZcfbhh1Mx4ya7OCuxE4B+pUK5volzBbkMMUw+C7iY4UrseKXEk89gCUSJFM1B0JlOq+htA0Qk7qu4zG/D88c1K+6sdjlMVW7ne1Ott9OnKZmhli1kO93zdiLejoKYWE48412JhcxW05N9JlnG0cHq96h+Z00sZXcldfxtxqm25XQwgDc2YDRS7ApxVZ5tmrJt7rUObrfMrl1Kyp1V2ncgoGhYhMLJ/HXHg1qiYy1Go/laW6QOmn1mosYGesrEIB1SqXa/RmOrWTM3PX7btpkOiVhlf6hp+FbPdHZQ7KCe799K1eiRxHcFgaG1j2ck9OhbfBd/EDxjOYYf70poXuAUhTXrOc4KVvOqSVZZcm6HlVgniM0Sp+vP8Z2jn6JTmo/SCaRa5FZSttpZYVETC1YdiZlBerZTh8qETi4W/d+4YqkTVrSzgg3CIQblfO01j0dm1/L3yPv+P/BXfU/SRRl+X5g/fSSPd97JMR0E3c6BgsPU5na+N60HH4kIRJFBq6uPFhXTKDQT7Ctw0o4zwKr+abw1sRNBsiBWr5WgBopbg/W0JlURNUe5x6wCvrE0aNntRi4sROjrIzrfz8LB9+kKFZru0+5ATwvig0Ksh57olEhlpbuGfexjq/ZqudhFp5RcTMaFpHlZpn8sWBxPIqq/ZzxT6d50mHySpbwLyPjozWoNuRmmGgOiSP+elzOecJ09dxaiHLEsv7B6PuziW3wXJ4P38TVeQ07zeRaQ2dfq1MRONguYBkHs6fpz1LV9Lk1F4bEgdiP7CaN2sIJhm/jXBKJ4GWCAmQl7FInEPDLHK9ggx/YnaWJPX+YH/JCvaHUE6r+t6ht89HADT1LBKn3hudPpOUxVbpd7My9witMNvzBMPHWFrIKT5AioHbtL95rMWieVOy2SiUJNpftywQIvg4OJA7n8/CgdHeYB4tOPdXL4zKfTqr8t5mqSwa51La+1mExmA810JndSk2wCemwVViXZueg/58mNMDSSjgp18uMwe89qAmAq3ZsOk4+952Xia7jjRchkQaBnHAJk2VyJNct0KaxZnyBEZZ/xia1OBk5N7CTxzSP/V0Y3mYDEKo4RZibmgWfiazIuBjCfUY7iQiqawyb3K6Q2s09+ZMrR5QAuLrGAVjZziQXIcf+2anRCFJtcE+O2Q0MCTU2Ov6yDw2ShyfWXzOSNmn/h8b7/SmfsOVdWrjKbPJRjbYfxNYFX29wJdULJ7L1ud5LVaE0lXnppGJfLeK+4XDIvvWQ9YfH862Va/a3SX6W+1/qZSS5Dpu9Z2WdAooer8ThJ25bCrK5YEMzqhseLQCubEWL2e+2s0NSPk+H3y+xsiSQ5Htnwl8x1oJwOHuBdhLh9eTzKQNvBIRVmz0sidgPYzJ+xeA2a8drjhBufpcn9bdNaf7fb/vNhZpO1ffsM6ly7M7bb9NJ/W/SlThA7TgKBHHqjRWl+Sp1xMq5UjJf5fui90MG+yAaOsQqXLUN7uw2CWYA9foJBgTcrGsDlcgawDg43ETVd9WTwPhZykVUcMxF7cxHfRuQyjGCrbUlExjhxlRc4ReG2GsRgF4IsIwa7KNxW47QD04zKygh79455sPr9UfbuTZ3K9siOezl3eRbd3WGWLpVI1R+NkMeIiSChyCgt1Fl+rokGC1EpGUkSDANHO4FsZWWEnTuN57tv3w0OVr8XCwjtYuc5UsYKl1hANYe5QfKJXzW4THXto7iI+Mu51noYBPP+XEDmt/6l/EPrJfa1Gs93utl3OEwe8c/L+CZ7BIyTMPbRT2bJHg/hxmcN7wcCOdyzCOaWeLm3pJc3KxqS9kXDlct4eNcSDhV9g2KuasdUVBTVau3NiPd0bWjIS6hxHxoSMogtxghTyDdrR275vtRJJx4n9tOK9Mc/vgDQy3VkxIRU3aIimRcf/RGPH30QQZJspx1lO4UjnzAeBtMo1Fce/HI6aXJ/m4d3LZmyKw4Otx+3amqcWm+TWJMfj0wxPYQo1iwJABufs95fa6xWp7hiAa6+xLr7aJFPU390MOdWvC8fe8zDmTMi6fZJevsdK9pZQR0tmg2GS5CJyon943jFUP5iycf86IL//2fv7aPbKO+8789oZCvf8YcnAAAgAElEQVSWnIDl4JRGlh26xZQuffpQun26LV5220Bvykt3zUOTOCRxQkLsAG7D3muwS4FSG8JdaB3AThzASYiTNDc623RpF8h2SwPds20DT097t+DtFr8ppTi2DLFlxy/SPH+MRp7RvGhkK46TzPccH4g0mrmkuea6fq/fL3a+Qz4nKSSSIbmjeQ9tIGCfcEst2WNmx8z0t1DPTU/oEHX3iOwcvS1pj+QzzPby73PDCysyPreDsxPqclmXIBGLn65yfAkfw0yIXiZj0wGpvNxJdi28h1WRp0zJ37bWujUl+F6itOVsyYpNqnz/cFhAEEjpEbbqi5+5fS4yxSQ5IIqMramaEXnVXMIpJz4DOB62R/GtNLBnw2Gcwk0bG1WRH1BovWvb/4r9Mflhs1N2NB3VygYkCjlBGxtpphavibyQ8RjkSPOmyac5XP/rLI3HgQMHZnAdD9NAkw1HVOAkC/EyQg8lrKaDWppZSzsldJP5+iFQXb2AsjIfB4auMz7CwLF1cO7jhRfGaG09RdDVRybzapBCSunCpSq7TUUlBxigCAmX/Gdy+tnKUrxz4kLs7vMjLOR6XsxgrzSH33eKN9+MarI/qSXeCkQmZBbsBMxkl2ZTLuyr28riJRdwb/UYO0bXJohp5FalERZx+9H1vHjLwRmf38HZg9RyWdmB1Vf45HBqBmcXEJlCIE4J3eyjkhEu4NlFX9dWDjRPUTmw3bQ1o7HRo+shH8VHw+SD+Bof0mVP7VRsKJ8pKsqnpmZBIlAkGJBcGa8XBQWSQRm2/Qy0oiCyLPbf/HN7FF/dVlufO5vgZGJnAU/oEFdUf9GmGHo2IRNYmMnzlNBNI/Ws4fkMGAozdbCNj1cYSoOu41y/4CfsHF2TEXEHQJBujvUXZvQZBw5OF87mjJcVMZD/yo/jDncz01hmLuM8R1UiuzVTeYTpdf0LvMK/8aXkqwP9JzVHWn2X8xFn87xMB0/oEBdUV2F3bsr7zvSxXqK0CXewiv0gSYY7Wwld9Brsn7PJxHpCh7iwuiqjPc/PCbYLX2ettNtU71ELiRzGdaX/IpNcyPtEKKSYPh4sf4X1R2/HLMNTyEAyKw3g80yS63Xz/vszl8pT8MA1b9D2+88TS9ofxrZFISd4q99Yr97BuQMriZ1kFV6iwiczu1WGQFz3GSPSJqs105xBOY6fQSIphHJqYlIj2JWnU43Y8PwATQ1ThCO+ZCXUbezL2K5W2Jmv7d+Z0efmEk4mdo7ha3yIJu5jdiRKM4OLWIIZWA8lW5LpQmA3GiwyRbVrJyL6h1dhJO2NF/Ps5FpcYuZR7b6U5nsHDhxkjsN1x7iiZjnucDc50gTucDdX1CzncN0xANqX72E2lSETeGigiWZqdf2GIpOJSpF0hD1C8u8nXIuHKB2sRPL7NUcp/btK7+zB8Oe5omY5S4r0UXGHKOrsx3jFrRT77TqSelbiUXzc5/0eA+99QDxQbPipbxc8nv3sY+NDliRSRoiwmLul77KAUexkWAoZYBF6QzxGDoNchISLXkq48+gqy/PIgafp5y86nkv0pERLiz6jmwnq6jy0/v5vEg65tVKC2ol2cO7CvLpBnh/X8yKVHJB1xi33JOPnw6jqMJW0yRM6BKWlpvvC0qXG5xaASPJZmcbYmMAj1f3J7GbqvtPUMJWBAwt+v2TYc15RMcUbb8P7re38MVDOKuEgxeKfbJ9XwSg+6mnK+HPzHY4TOwu4joep5ADVtKB3ZO2m/GeSYRYSm7b1wpAp2tiYdjwCcfawhuaPbieeZvpMTgrEYpmPI2DbeHHgwIERQiE3tbs/Q68URA55yWzBvVKQ2t2fIRRyc+fuzzHb9oYeSmigiQ3sooTuZEnXHtYmSjdFSuixeTaBCbxsYhfP3fyC5h1f40NJvVRFW7tXCiKhJeRJdXbFcB8Lt97lOLJnIeob3TYYTcFsDoejfjyhQ0QbHkDKy9O8J+XlcXPTJ3XkTLMlK3IdDycySpns6wIRLlKpFJgjl3GaqSVC+kqlUXy4TAPsxteZiIk64rXCy0pZXLSIxUWLeLn0Hj51mdxre/GHfBQZEODs3ZuT9ns4OL9g5iDKEGhjc/JfmQaBjCAB48unW1WUfYGeHtN9oaFhnLzcyZSRxS0znr0E8bW3UVTk44Lq9XwkfJT90grEcB/hiH2+iLw8icbGcd58M8p7740YBpHUTPX3PVVoUWZs7nuciwmiOSknLisruxTYAxQCg8Cazs7OP9j4aCnzuJzYf+XHERN9JR2spIGmpI5qI/U00JSGsOH062LZvaZCjFHDk7SyxWJccSREJKDEf9JUQH4mYwDwMkpz1S+4edtVgFNC6ODM42ws20xHOCdvzlbBrszWJiMheWVN7CGY6Ly3HzNNLelcXLQoORqzNopAIE43pck1WY1YoDhJYnOu4Gycl5lCTQYjmyr256TIFHFcLA3AA8t/RtWRtcl9ZHz5dXiOvJz1fUWxCRbTP4sSeyPIZDU72UwlB0yfAaPPybBDIpM4MqGb6wkdYmFtDcLEBAA1PMkOagyf41zGyWeYIQoTV7R3n3ycpKvfcXjPdaQvrZ2upuhgpUW5rDkB0j4qNfuPes1X2+pqpO4LoZA7WbpbTG+iVTCz+anshensf0GQkCR0ZGx2YaQrq5zjqqJBw1aJ+d6qN5/LiXcAT3d2dl4KPA3M36LsDBBteAApNxeQSSO6WUYckS7xL1iV8wKN1GeFsEGGVWbXXtZXII7IhMGxErfyfQBauIt8g1IlBYWqso1HIptn9P0KGSDo6kMgTiEnKOREMoPTxu1sOPhlPKFDTlbFgYMZIh05jXUlB2ne00MpVZKQe5E6hEo2sSuxibsS17NPSKEevyd0SCMBkqr1pyAcFnCHewyJfQ6Gr86YlMPBmUdFxVQyOxEIZBLIlojhRsJFOJwgPIzezNi6DQijUfLanzkt+0q04QEkoJnaDKV20kGgkEjSSLdvW8iBKiFR1h+kG79gTZqmZM18jQ8lHdgOVpo6sCC3FkRYnCYwpsckHn58dbPt4x2cnVAkdszWf1ElNbUi8DrrqqZ0z4+XqGVlwSZ2adZ91/Gw4f+rkfq6Urr7Xn+UY/2FGa45Mkbx0UCT4TOqyAspklz9/cZZVztQr42p53iw/Iju2l6iPFh+JOPrzHecdie2rKysCLgSkiGSA8CVZWVl2QxTnhGMV9zKcHMLcb8/aZ7FC/wMP7WT4e2trAi8bqtEV48UEXXiVPN0mrK89BuHhEAMj8GxAj/mhuS/drDZRFMPTrJQtVBI5CX7eCRSS6pdTOjO4yVKM7X0xIPEERmgiAGKiCPSzTIqOYAwNoav8SHy6/8pWUKYHGniPQcOHJgjUDAy59fsJUiJ/ySiNMVaabcB67FAIQMUMkC6NVFdfuZrfIj90ook+6x1iaQgs5yrDJoOVrJJ2KURkq+uXsDKov+g8LJSJyh2lsCIQdcY+mzNKD6qhr5LfvtOLom8wf6UIIcwNsbCO++Y9VwYr7iVifJrWMWBrHH+K0gtBVTvvT5O4mbS8HMgB62C4nGO9RfS2OIlVzTWp80VY8meYLWB30BTxkQydjCBh292rs76eR3MLezwEFRUTLGh/PcYJVE2sUP+v4R267Zt47RV/Yyg0KtKcGwkbmHnKs6jAnVPbGp/bLrXFTyw/Gd4hVHdeNOhlyCVHGBHwb06LenZOK52ccMLK3iqfD9BehCIE6SHp8r3n5OSVnORiS0Gjnd2dsYAEv/9U+L1sx7jFbcy+HY3A/0nGeg/yWBnN+MVtybr16/t30mx38ygNNOF0mYsJFzsocqCit9u5NP8OHV2o5IDPEcVLgMh9kkWUE8TW3iS1ezTkUOo4UZiA7sIqnrlUksOjdDBSi4JH8U9dMIwq2IWVXPgwIGMplP3zFCuYDYQ6IssREr24OoRERaTF/BjWc5InOsueSv5b0XLtodS1bmtDQm1QVMvPMqo5NVd5Sdcy52Rh1hYW+M4smcBlGxOsX84WcWTOsfl/dF4biiZ2dQghwIhFstKRvbkCz9krOp2gujLF2cDhbxG6QlX772TeBK6AOboi30YkH/H5qcm8PvGUNsbft8YzU9NJI1rtYFvVv1gDfm8LmL4OMn51Kd3PiGTirmrVl6iIwQVmeJz/AeSKDL8xJPJsv6bt13Fb1uOEBNzkwmOkjT9sso8VZxhBWZ98epjjL7XhoNfpk26Pcn3IJMVpkeQ3mTfvVWf6+nEDS+s4Fi/P5FV9p+TDiwAkiSd1r9LL730U5deeunvUl77/aWXXnqljc+XSucA9u2TJK9XkkD9F0/5d7rXJamELmkfKyWRSdNjZvpXQpfuRYGYyfGxxJ/N8xYWal7cx0qphC5JIJb8TsrrhfTrfgMvI8lj5JOWnOE76sDBPMa+fdI+Vkq5nMr6OjHrdaZEkgQh/XFeRqR91a9JkiRJJXRnvFaCJAnEJKmkRBIsjhOZdNaUsxH79klSYaHhXlJC14z3vGzOBaN9P5dTUiH9kkBMcjGV0bMjMpHR99N9LbEv8y+QmytJMKNran7GfftMz1FC16x+Y2VNKSmR/+1gjlFSYvs5Mj2ULvkmGkG1YVTzpIVdmjiP2USwMVnUhxQKA8lntYQuqZonpRzG7O1dhXc5k3F2KJUkez7maSd2SpQT/xdQ2NnZGSsrKxORyZ0+2tnZmS6sUco8JnbKBOombJcUs6kHlwoJAclm34lk4xgZXmGUNul2XYbUjDxCZMr2+AXiTBVchHDyA4RYzJAcwkuUtbTzDLfrtO8UlNBNN8uQgOHWZxxyJwdzhrONQMd/5cf5SPioTeKXuYOie9fY6LEknVIQFMMce/cClhT5MiT5mP78W2se4mPt37DQ8pYJQVYJB3WagvMdZ9u8PB3w1W0lb/ezCCo7RslU6svZtTDSlgQM9SVnisN1x3h470fpi32YYvFPPPi5f2XNOw/jCvchEsu4RLeEbnoJzkgjsrnql0nCRLs4XHeMh/f8Bb3xpQbkbBIgUcggJ1nEJNOsxl4vPP74mCbz9OOrm6np/LrmvniJ0lL2Xa5/rTajcUGCLKjWzdhEjuZ8Owru5eamTzo2whxh8ZILNM+fAqPnyEyLVSDOZKDUkHxPIWVK91x7GWWn92tUjj1jStaWumaqiUP3XbiFO6JPaOaTFnHSFa+KosRTT82O4fx8xrwkdurs7OwHfg3J2p2VwP9nw4E9p6BuwjaXpknnqKeT1tEeawcuYqy8uov7/DsRiONmEoEYpXRxveslHeW4lyixDKZNkF6E94cgFjMlhxjFx06qTR1YUJWJ+P3O5nQWwRM6xDdLQ1xclJeUZair86T/oIMZw3U8PMPyPwUS6dci++cSiGvkS+Seu/TnV8ofzTTxChkwJbfxEuXLsR/wsfYGeim2uJ5MCLLvwi02vouD+YboticYbtmlubuVHKCNjckSQCM9czCX8pAKCrIyNqUcsSdWTByRnlgx635Vm5T+mImUiKJ+kBmkGTmwoZCbuw9eQ2+8GIWcTU1M09p6iv7+KG/1L2B7a1zT+9fWhs6Qv/61WlrKvptsLwrSPWMHFqDpH6M6h2MUH/cP3eMQQM4hMuk3NZPaKRbCpqW9CllaA00mDqyUJAVdPbrLNllbahn0/UP3WDiwYMddisX0897B6cVcsRNvBu4qKyv7L+CuxL/PWxSbbkJzTzUfR+DA0dKEVI6Q1JPsoZTd7g2sqEwIMKt6WtP1JSjwEqWReqQLC0AULckh0mnOKj0GI42PZfgNHZwpeEKH+Gz1X7FjdG1S+D4Wd9HenkPD1b8608M7ZxFfGpiF1p6S3czOWlTIADHBrekHqqiYwu9P78T6GcR/5cd58HP/asi02Ext0lkh6azI69Ra2tlDVUJmQAn8GV9zFB9fm3jMYS8+myFqM6pqtYA9rDGcP7KeqwGyFL9RtI07WEkpXfLuOjaMr70NF3EG8JsSKJpBke9LJZuRiZqMCc8CASkjB9YTOsRLZfXcWZ2jk0SRJIFAQNL196UypQJcdpmPoqJ8ioryKSuTn6nrX6vlWH9hkv11pg6sJ3SIcNRv+F4vQYcAcg6h7jdV5rqLGJf86eccrjumOdaInE0gzvKrh02TE+MVtyL5/aaBWQEp2TOreX1sjH+p+SlXFQ2ypMjHpy6Djo7p99Xa4wA9WejNFgScvWOOMSdObGdn59udnZ2f6ezsvDTx3865uO58xbfK9hlkEKx2TrP3Ul/PfPcViZuWZ4xN5HDkiGyAtlX9DElwcRv7GMFnY7wKc7GAEB1BiMVmnB3yEqVRfEDT9O9g/qPuHpHf85cYsWHv6rwGX93WMzGscx7RhgdozHkwYwNZRvYCabmM00yt3CqUgsbG9NnY97mQJeE32HB0A3meOH5hMMm0qBDEKc6KhMgUOUgJlvMfc4MhO7LZNSPRBRr24q1bF3C47lhaxk0HZwZJNtSiRSys2YgQ05MQKnc6NTMbpIfmql+yyoRgUHh/KCtjdB0PJ0sgFakpuXxZDhJFWcQEOeRbkB6pkcs4jdSzKu8HNK/7hSbz2fzUBBvK39LJkuTlSUm2YTvwhA7xL3e/xuahR01bhsJh6zUiFHKzfj1EIkrwSGBoSGYEN6vCscNuq4av8SHTQJ3yukMAmV0owQ21UxgKuWWVjieeZJ93o4aArzdeTG37X2kc2YqKKVasmExk9GVIuNj/q8stnb+RxscoFozvp9k86GAlm6Sd9CbG0xdZyKb1U8nrqOdHByuzsvNJkkBjo1NpNpc47T2xs0Qp50hPbCp+fHUz3+xcTR9Biukl6r6QwakLDY91ETPs3SnkBPlE6SWIn0EGWUwmRqiXKKPkYR3LkCgokIhGBSYm7Aima1/3Mkobt9sQaDc/3xd4hSPilxl4NzvGhYPTj1DITXX1Aszno8Q+4Taufa91LoelwYu3HOTBo9fSRzHF9PFg+SuGDH5nY++hJ3SIZTVfISJlQ9jcbn+9hI9hRslPZowqE1IjA/0ndUdfdpkvYejaH0eha4j/9fkQ635Vq5PfUsM1g37DVPgZYCEjyRLOxpwHuXH71fMmkHY2zstsQCkDtLr/ErCflTTQlLx/1/Mih/hqYp+EfIbxME6EQs18jQWKDXvzMoX93nS1bZP6nMnvFbqGaI7fxYrAa4a9fgrU3BtLl8oObCbljXbG7HJJ/PnP5hJeV17pM+15FwSJlhZtz6ByP/ePfSV5v4qFMPev+6NpBnnxkgvYL63Q9Ui6mEAOzbsQibGmKs62bTMJ5jlQQwlubJp8WvN754oxfIsE3n/fnOslSDfvBMqT89ZsfgQC8WQW3wiH645Ru/szGqZ5L1FTxQsze1O5jtJra3XsTCAIEu+9N/cSd+cCZtIT6zix8wShkJva2gUpjiKITLKJneyhSkeIsJZ2fswN9BLERTwDsiiJQgZoppYGmmbx8KYat+aOaAk9XM+LtHEHMaz6DvQQiPM8q1klHoJYjHig2HIjd3BmEQq52bp1ga4ULRVeomdMu+zFWw6y+egaJlRkJLmMs6N8r248Z6uzYEaicTpRyAkGKNK8Fvf5GOx6V3dsKOSmpmZBxmNUDJdVHDB1rTMzSuwG5GTSmC91NhkcO/c4W+flbKE2PhV0GDisqXtm+mCMhIs4VeVv88gLsy8t9IQOcWF1VYbBlGk7x++XaGzMzAmdLRYvuQBRmkozZon+fnMjPf26I/fJ+4QxopKPYvE4X479QG/jCKN8dZ2bI0fcOqfcf+XHuSv8T7Ri1MuutUmqqiYdR3aWmB1ZoExIWkwvD5YfYcNrtxuTO9lw/tRBmkDBCI8M30Xl5B7DY80DmRLPlT9D7CN/wcO7P0KfFLAgS7VPkKognTPuwBzzktjJgT1UVEzR3HwKvz+OUo7r5wR7WMvT4td4qny/pjdV6fnSaifagUQ1TzNAEZUckPtrTMhR7JxLS55h9rAL9FDKs2ycESuzhIsGmhBiMfazko+Ej3JhdRVXfeh9Xina7JT6zTM0NnrSOrAg9yJuPFqlKU2aC4RCbjYc3aBxYAEm8PA/j35lTsYwFzAj0cgMmZ1jkMUa/U1JFBn5TrPhsRUVU6xbpy0ts4NRfKxlL2KCgC5V7xPgel60MXaJQk5QyIDJ+9o5PIqPbwzdk9FYHWQPL95ykKuKIuSEu5P3vYOVLKaf1XQk98IeSmlli0k5uRXknezZo5dnhXxuvOJWAv5M91YBUYT+/hHefju9rqRShvtK0WY+9qFJXQ9qpphdP72M9OuOgITIiJSPhEBvLMAOanT3a1Ty0t6eoynzv/vuBXKVj293woEVDP6012pvz3H6FGeJ2ZEFyoSkvZRy59FV+KVBw6Os5o0yz++o8dFNKe+3PMcbb8ON268mFihGEgTifj/xAn9y1fdi5hALbD66hprdn6dXUpi+zW3XTJBp+b6D2cNxYucRKiqmePvtKP39I/Im1r+Aa/t3MvDuEDe8sII334wSE9wWPV/pkcM4f537pqZfaC3tul4aOyihN0mekY6YCUg4DTPLDPUQ1BkrvfFiNtHGwfDnHTbCeYTjx+3f4xjuZL/KndVuXiqrP633MRRyU7vFbZppUEoNzwU0NIyTk5OJg6gn3PCRaaZPSBC4QbzAz/BTOy0rJrZtG6elRR28swdl3vRQShXPsZh+XMTwEEUgrjJwrceanyCIsstR0JcF8o/5ilDIPS/JrTyhQ7wcqOXOo6vopURz39fTziAXYeS8zBwCe/dmVi1khvpGN5kGggzaew2hlOEeDH+etexmMO5H3YOqOHx2zqP0owrRaNrAtp8BCstKk+t0aj/rdZe8Rabf2Tzzq72Pk5MCDdUjPNdZrnvPHAJbt9r7Lc5H2OlHji8N4MfY+cwEowk+ldT5lZcnsXz5lOH6k8oirGYeHq+4lcibv2PgvQ9oawxT4juBSIzF9BNloek4JvAwKaV7xtPxvsTx++MUFEz3pisM/A7mDk458VkGpZRqNj1fJXTTxbLkFjCTfgCBOJtpoYW7ZnyOTK9n9n0VDdls9TI5mB2seqLSwcdJhvM+dFpIvEIhN3dW56SpBtCXyp3NZZuhkJv6eg9DQ1blUjKL8K18nx9xA30EWRqApmgtrqFBbmNfRmuNor+Z6fO4rEggSr7t47MD+fv7GGZBoj+yWPwTIwsKiUTzdEcX+4d54+05HqIJZjMvtfMCIAZJ0iEZiq7vmTTKFAN22djv51j32LpkNhN89RO9/PTPl2PX6RJFiXffTX9txRZYTH/CkdcjXWmjur9YXZLtZ5BTeBKOgHrccrC7hF4aXffz92sXkHewQ9OfLGvZmukyZwOKLeiUec4W993Sy+6jZcQQEYmxiR08zV1IBX5Gmh5L7sGe0CGWVd9ExGSe6WF+j5T2sHrxMfriS1m6VHZgDx7UsmEr68/GxjLD9oF0n5855Pa3EXymz5VAnJjgPuu0xec7nHLicwhmUXGFznw2JT+9BJNLSwcr6bGx4ciECVpGuT1UJUv5GnMexMuoyaftw80kqVIBVg4sTGvIOmyE8wNGNPryPU0fiIqycNbyCOpn55IPu1lS5KWoKJ/q6gVpy9mzEW2eT6iomKKz09pwk3AxQBEt3EUPy4gh8uabUW5u+iSr8n7AZlrIJLOSjh3UKPIfCrmJzqCyZPaYZoo9yUKeZzVvrXmIxu+g08jOy51MZNbOHqifhU9dBi+V1fNK0WburnYxNDTNHktCAkuNsbEzz7SpyGDMTvdYgf05LOp5FGeM7/8myMaynyZabuTmH584hrEes8SaNZP6kxhAeb6sqkfSVcWoZYDW056schrkIibxUM3TSQkruVrLhSLBtym+g39uj7J/7CtJWZXF9M/iXtlXWxCxma5WIZMKofMBdXUenj16+bT8HW5a2cIWnsQ1FGFhzUYWFy3Cf+XHARgiM5JAszaNIL0yq3w8mJRkOnLErXNAlfUndR9RGL97Y4Fkmfnu3dlyYGVcz4s0U2taoRik11Qf18HcwnFi5yFCITdba92aXpCaBD29Qmf+cMHjM+5lVYxMZTFIR3ZRQjcFfKA7bhQf9TQRCxRz4/araa76BUGhF4E4hZywLfEhMN3ru5u17OM2gmI4+Vo600P5PqmLynwtjzvXUVExxRNPnNJIQOwua2Iflcke6nQG5UwDEgqplPLsjEzlIalkLawRZ1vVb2d03fkMq3lv3g9Kcq15MvAY+6g0kQLR/lutv2m0yZuVhjU1WPXUzw0mWcBqOvhYewMLq29np28rxf5hBCSCYphdE+vY2Fh21rQtpD4LfZGFbB56lFq+xyT2nNMzbfgr68BMg7aKdrDiQArEUJzHfE6SyyizcSTtovG1T/Nu/xj9/SP8uX+Urnen6O8foapqElGUxyOKmZEQ2TGi0/WnuhIZrlqaDTkCDvFVulhGCb26QPIoPmpp1siqGJd224GEoPuY8XlcxNnEDuxLD8rIDkfAuYM97frAFQi0soVSutgibWcZXeSEu7miZjl+IRN1CCERXNE6gWb7g9k6c/y4oJvnDTTpWumyS2Ao0EoNAJtp0TmyyncYX35dFq/pYKZwyonnIT51GfRF9PX8AhItrdPlXaGQm6aGKfoi+ViVCqYybCqU5PZKgOPsYzWr6TC8hoDEe6qyK0/oEPkN/4QQibCflaaf054jrpEQkgDJ5+PA6FdokL6dyBQbnyPJVJr3g2QJqid0iMP1v2bz0KOaxW4+lMedz1h0y03kHn0VAdm4NJKNAol9VLIi8PqMSsNnXsocZ0P5W4aspGdzOTFYSdnIv3WqPIGZJA7Ay8vu4RvReg0LrMKQrpHVycszLAk3YpaFdJI4dlnQswc1AzKpVzf5bqcLntAh/mHLMn4S/9vka39TFubVt4st56X5s2D/9zvTJZjKfFECrur1PIdTCAg650uBwuBvxFKczzCtObWsWvhD7ox8K8GaLyKKsGbN2ZdidUMAACAASURBVMFmqwSEFowNMIHX4AiJ1lbr/W7xxQUIsVjCUDe2IaYCJbjD3RjnPE7/s6hHHAmRL/ISP+FaUteGy/kN3fyFs/enQVGRmd2oQHtvcxhHANPnzQxyJV2iBF3ZH3JyGN7emlxDrWR3ftfQrpHUmnkrXWZzVWlTSy2zB+S2kzQyUA4yh1NOfI4gHDEuq5PQlndVVEzxxtuwq+pVjKOP6uyTzMSpOLAS2Cz7EVjLHswe/mJ6NZmJ8YpbQZKPruQAJfSkvUJqlF0ADkRvYpO0M+FkG2+u8vfZxIrA6xoHduHWu7h/6B5dtG5sTOCROwdNMymZCq47yAwnX/ghw63PEAsUcwetmM3ZKp6jfbkxbX46ZJ45klnAnyt/NiuyGvMRkYj5b7LKwIGdKL/G9PibvvMZuvIuJ45IN8to4S668i5npOoO/hgoZ5VwkFig2NTJMyoNK6UrIXFgBgGRKQTiBAJxysuVbNrpwyg+mREdg1zF2Bj59f90Wq8P8npUWFbKP1QHEg7sNAPrzzoDfPHjxy0/P/ss6pln2lTaZyo5QBsbk5UcQbppZz3PUUUw8VohJ/BzQn7f1Ucbm/gRNxqyFI+wiA2TOzjw/vU82rogmSV9992Rs8KBBXmv3fjpXzGBvncbJMrLY+mdNhssUndc8q9ki701GyhJ2AtH8v6eXVWvJlQbJIJCL/uo5Hd8kjY2JqrCHMKdmSOFVAsP+ZxMVDXYh4SLEnrpYhmrOCAT/qkcWDBuQVKYfpWKIIWFuFj8k8lorchJpYxL0HsJIgGr3P+bRurxM8ggixnkIplYVApSu/szToXfGYaTiZ2HuKpokF6TDKmRllYo5Ka6egHpNpSgGKY7HiS+NMD48uv42J5v0hsvnvE4lWzFSv/LDL7dDSSiw9W3a3puUyPoWsiSPwpBlALzLLHcdN9IPavEQwy8qy1xSUd8JRAnlpfP2IpKPEdexnU8nPw9Ugkq5jrjcr5h2bJ8olHjOTvTDFCmmVg/JzghXsyIBYvu2Z6JNY+4S5wq/ztyj76afGWi/BpOvvBDy/N5QofwNT6UfHYy0WxWZ2LTrw3TEIgTQ2Rn6yhba92MTWSHPdYacpmnOoOgfme49Rn2syqpW6jWsZwt1IQ7VlkyK/Ihs2ehkBOcZCGTLLAYgcSH6eXX/f6Mx55tzGa+LSnKtwyQlNDNHwPlZy0h4MUX5xOLGX8/QZBYt06bVU79LYXRKAci11lUTM2MROl0QbE5VgRe182D2cyT8w2H646xsf0aMr+vSh+3nfYcNeRqu+HWZ0zviVoD1motNdKhV6oufswNFpV7mc3lQk6QT5SeBIeMWfb3TFernEuYSSbWcWLnIV4qq2ftULPhQ2P0wNg13FMdYKPFwD6mnU91+aFRuWANT7KDmrTswmpYCVUnnVgO6MoeFy+5AEGSTJ1g5Vq6AkVBQDB4FtQMq3YXWQf2sGRJvmkvi0CcyUBpRoaIJ3SIH/7jL7gj+l2b8lNyqbxV+Suc/U5sWZkvQeKjRUFBPC3pU7ahds4yYTQXmWIPa7jPt52+6NzLIKnbMBRUe9vZObZWM4e9RFnj/d/8aEEF4aF8CgokGB8nEtU6jAISEgKBgPE6oi6jtXIwrJxYM2OvjY20szZRiknKueU18HJ+w/8R/u+znn0z3d54trOMpisJFQSJlhY5C6k8e1vGHmMn1SmyePPDSTWHZPqsOMgcV138Ab2xuSMmchEjhpvnC+6kluYkK7rfL9HYmPk9PVx3jIf3fpS+2IcpppcmVZDRPOiXyXgncCPZKp02Siw5mBmccuJzBDc3fZLNrjZdeURe7qRheZfdsrFUYgOFgEcmL4lnWCYiaNiJFajLBZVSwVa2WPYw9FCCIl6vwJzIQ5CZEdnFvoI7de8qJADX86JpQ758lml0sJJl0ju4iOnGoXyfVJKUcNhFTc0Cbrkl74yTR5kRWNXVeSgqyk/+LVmST11d9thGZ1t+bUW0EaRXowdnZywLa2tYHd1FGxttz+UGmnARY9nY7zlc/2vbYz+b0NSk14vNyZFoapr7skl1aZh5O4NBMAk3m9hFXzQzhsxsYRQfq+mglC5qeJLF9LNjdK0uCDOKjx2ja+iLLESSBCIRV0KqR9D8yevh9DqS+ly6jodtEu+ZI5Vgrdg/zI6Ce/k5f63qJdQTuwC8xRWUePvP+lK5hoZxBMF6nZkvLKMzWU/TsShLksCdd8oaqb7Gh9gy9hitbEnwEajn5JmA9r6ITOpeA3AxSWvrKd58M+o4sFlCX+zDc3q9OC4E4qwZatawokciLmprM9Pw9YQOseHgl2mK/RNBeukjSANNOls0M0jJv0JOUMAHtnt/HcKwMwvHiZ2HGK+4lUefzmFPQW2y36fYP8wTzVOGi7idh0jpL0iF0lf7fms77QVfN2A8Nj+30jcmeaezXopBoBhg5j2takw7pknJnjSC66P4aBCadK9HGx6gI2cte6hKcZzjuIhxG/s0jqp6nFJCOmA1HSymXz4mN1ceT6NHl7GWJIGjR0WNY5tNUfVQyM2nLoMlRT6uKhrkpbJ6nWFj5Fxv3So71+3tOWgMZ0mgvT2H//fq92c9Nk/oEP+y5VU+Ej6KS5oiN9zFBdXrKSrK57LL7DnzDQ3j5Obq51cOp6aDDWnkdu67pZeLi/K4oHo9ORNRaniSSg6whzVp2bsF0Nz3NUPbuWyZ+6w33FNRUTHF9u1atujt289cj5giUG/mO/gZMAxCjOKbkbRG9iCvU61sScPCmtm2KkkCu3fnUFfnSQajSl291NJs2YbxBfGnac9dUTHFm29Gee+9Ed54G77U2USbuMVi7KA42X3RxVldz84EKiqmWLfO2DnKZZxG1/1EGx6Y+4GlwIy1O50jK7MoW+//sZjAPdVxDoY/TxubmT9ZV0m2bxJr0oXCSYzH5sL9y/+c68Gd0zDrKz19UOwQ/do4MWEs5WUWnPc1PsT+sa/o7DbFfswns6opL1H2UYmEKyk3F7EpJ5QjGCeWHMwdnHLicwChkJuamgUmpZmZleGklsxeckmco0eVqK0eAnGm/Bdpe2JnIU5fQjddLAOPh92feZIHf/4/6I0tNbm+8XczY3dOHbcEiMQttEPjVNPCJ1o32Oo5VpCNHglFZknd++clSlvOFm7cfnVaVj9rJj75WRKJsb7sKI2vfTrj8b1ceg93jJqX7YqixFNPpXeUQiE339gyxmC8AJAlX5qp1fUfGrHlfvUTvfz0z5eTWg6plLkrrII9BBEEQfN8WGkP5+VOagJGZ3s58XyFUbmrIEhMb0lG81cpFbPPqmkFZR04sxmpxFgEKWUNN/seEl/gFf5t3wAnrr1J805dnYe9e3OIxdAw7Sb7BcN9uDIstzsXer5CITcN/0iyrLuQAb7nbeCmxz8zL/omzVi71e0sZqir87BnTw7xOFjd1xK6LZn+M4dVj6GdZ1AiKB7nvqcKqaiYYkmRz3RN9gqjPN4iOJnYLEBpu7ktuoP5ksey0+qWmyvh80m8PyTLLBnZbfb6/dWQ7QWFYV9hH5blgdI/J7mM83TVfzgMxVmC0xN7HqOuzsPu3TkaIygbtPLpeopK6KZLuETTU+QJHeLC6tRMaCrMNrk4z5U/yzff2ZB0pKNRDHv6FKR+T6tey8whkSNMMSnZJ5LJRo+E2e+eSkRi/l3tGvIS+e5T/K8nyWieWJGPKXARYy+3GZJwqKH0MZuOUBAYbtml+bwVMYXIFFNM369YoJhdDZ3J4Eyx1JMoZTWfU2rD3XFiTx+UoFk4LCAI6fX+SuhmWFhIRLKKlKeb+yrCJuEbrJI6WJZBf+6ZRgndvFPwKVyRQc28rKvzqKovFEhsKP89u371afaPfSUR1MnMkXF6vk4fkkHjMPgZ5BQeosgBWCWgd23/Tlvn+mZpiB2ja7EKOJsZ/9OQEv3a9pybQk4whlcnYaRc0Q6U/fuROwct+zTPhWDKmYaSZNg/9hXWsUezT2YOOwFFe1DurXo/mNl5M5PR8XESCdEmh4YxgmKYY+9eMOPPO5iG0xN7HmPbtnFaWrQlg9mglbfqt1V6TFN7isYrbjUtFVR6Dszp0AU2/vx2TXnsyZMyN5wZxsYEGhqmy1EC3kHTYzOHkJEDC9Pl3WblMHZg9rv3EtT0Hc++H0NgZCov47LBPhvyTHFEVtPBkvAb/Mvdr5mWxqXrSRMkCV/jQ8nS4aKifEtmxZhaczgnh2jDA5qyyncC5UmZBjPMXp7EgR0o9yUQSM1C6uElSmPOg2xb9xvLPsd0KGQACZGuvMu5seVvAbP2BaVP6kxCe33lNxhpekx35N69qQ4sgMDuo2Wa8rtMjUOn5+v0QNMKgotBLiLKIhTHYJCLWE0H990ir1We0CEKLytlcdEiFhctorCsVLOmtoytt7xekF42sQOzOZ3LOPuo5HlWJ9uY0s3/CIW0sZFCTqB1Xu3PsbExuZz0/jV/sGwBcdbk2cPX+BDC2BgNNM3CgZXljAoZJBsObG6uXE2nfh5OT1WMvgd7DN+sHFiY+/5iB1o4Tuw5BLWhni0SBDMDRmSKNjayKu8Hhj1FRrpfXmGUaloYw4uEGSOFoJMMkP9tvahFIgJ1dR58dVt5JHp32n7I0weJsfAg3wy+YNiratdRNPvdU4lIzIlLMtsExsYEmhqm8F/5cV4p2sxVF3/AkqJ8PnWZzJadSjZSnMYJVI9jkIvYNPm0hjhJ7eAvG/0dHTlrTc/QwUq84bd59ujliSyC9XwQiSGBoR4dyH3TjTkPWs4Rs98/HfmKozU8M5gbqBKKLuiOgnu5cfvV3LztKtM+RxnWQa8IhRot23igWKdDWkI3+1hteR71GDNH6meMz+EiTrFvQPcbGFU1mEl+xhLBJHvGWopWowmZoIPZw4hnQQ+B545+jPtu6eWK6i/ijpxgGV3sZyUHhq7jiuovsqRIDpLuu3CLRc+4RCP1tHAXm3OfxYWitTxNZvMcVVRygEoO0FXwKWJiblqd9yC9qtaPmTsex48L3LztKp4q329KyucEU2YPJQCeCaleKvxChB6pxHbfqPE5p+de/mSEhdW388idgzNUykh3Lfl1H8PJBEouo8RwJwjOZoe57y92oIZTTuzAEoYyDcIobdJGVgResywTNZKkkUtFTlfsZLpM8HpeTPY5eBlJlGidrkiuvnzKrOfSbkmU3Z5YSC+zYB/y7ye7ANNjV8uLSIBU4Gf/yE3cMflURlFMkSniuChgkBEWpbD/SfiJsJ27NP2wHaykiucy6nHZUP57HnnBOlOs9AXVRhuJpPS/mPXEquVhQJaOamOzKvMrEUyh+5dycpAWLkQYGnK0Cy1gVj4fCMST60aqtFVRkY+ZxGFTn8HU+wqyRvTYikoLLW2dSBeZPYOyVFgvQYL00mNa3h5nKiE1pdbAHF9+Hd5/P4LUKwe17rjkX3n2aGp/eKaI4+EU4+QBcsb6uwXf4kudegI9B+mh9CEfDF/N3cJ2IpKsuavIimTCs5C6n+QyjoSkWRcFJD7G/+H3/CWpc/MLvMK/8SWNlJgndIiFWzaxP/5VGmhKzsXreZEfcQN9iR7BYd1aLUO9L8xW1kT9TB6uO0bt7s8wKnmT78+2NcpMQ/Z805ZVeq/N5M3yOckI+Vivq9N9pNlqwfASZZS8NNe1Byu+Cy9RPsvrKnZ2+8jhFAKC5lnwEqW56pdOT2yW4PTEOsgq1P0JoihH+mer1ZbdflVzqDfYTPQoTz9kJ7FYPE5TrM4yEBAKuWlqmCIc8VFML98ueJybmz6pO9aeTnCmRrYWqVq+Um4u+ycqVD12zOr8CryuU7SJm6mc3AOQ4b2T+NsP/Z7v/yZ9qbMCT+gQh+t/zTeG7qGPIAF/lG/efIyqI2uTho346COcuPYmDflKDU/SihnLq5TsZwOS5FIupBRtxmmILok1a6fYtu38zHoZBcvy8iRWrJjk4MEcXRCted0veLh9Wdq+7FSkBigUmBmzL5XVs3noUU2wxspIsg8pwYYpzyA3k4a9iiJTTJIDeXlaJzvxuQ5WUkuzbSISc8TJZVJnoLWxiWv7d8zivOcn1L2HRkE4UZQSmfPs7oUCcf7O9SqvxsuJISIKcTbmPU/L2Hqdk+a/8uMcDH+eTewy6GmdHlcOp1jEMIMUJogQXeQzQhSZiEkklgjkzey7GDmo6TTZ1QGCenEbfbGlLDWxTayCVP+8d5za2OOJ5wcKGeTRqt+es06Jel6m3ve83ElcE2OJknZrCMT5O47MyBk0g8iUSb+2hN8vMTwsMDlpJ3AYJ0gvvaa9/5nbQiJT7BHXM/G5q3nw5/9D1qcV/8T9a/5wzs6VMwHHiXWQNZgZlaebKMoILiaIY9TvZQ3F8XIRy4LRmX0ojvaqvB8wtqISz5GXNUb0flZZbuQKjO5VKuQMS2o0PQNyF+K60pu43w8SCEMR9mfNmIZi3wBdBVdyMHw1q9mX5nwyEUmx6zj3r/1v/oPPGbKz2oWRwQMyuRSSlByJmdOhhlHk1hoSVVWZjfdcQiaVG0Ghl0bvt1kf3W4zSy8RdIW5f+1/Z2R0LF5yAfulFZpMlXnW1D4KOcEJisDlQojHEYiZnDNOXMxFMKgVVuTBZtvT5SVKHqMJ2SAtgq4+jv35wlmd/3xEuozX6USQbn7b+pPk/mEWhF685AKWSe/YHJ9kELyZXSWCAiUznc6uUDNsA+w3mP+pNkoo5OaROwfpi32YIL00qqpkvsjL/ITlunHnMk5za/ycZUK2CgBkVh0Qs2gLmwnieBnTBVQKCqY1zdX7w+jQeEKDWwslqz970ksZuYzzrO8ubvrO/GAyP5fhOLEOsgar8r7ZMAQaO1zWi0ohJ7iV71tkvoyhOF6ZZvPsSAPYZf5Nd5ziaEuCoGHo7chZy0Zhl6ac2CqIYMXympcnsX3Fq+Tt28OmyadnxCSZmokF2bFTWKkLy0pxDUWykhkSiNPSOq4rp9ZjWk4nXuDn9q+8a8jOmuoYaoyhhHW3X6ikXvo2fQlHRW3smI3xdJSni6LEu+86TLAKzAwRgTgxxJTgidxDatTnlHzOVOWUdmAkfTJ7x0Set08GHmN8+XXktT9jyo7s5wQDFBnOtJmPQ0o6NUF6+DIvmq6tAnHe63cYYTOFwrh+ZgKoaltJf0+9RNlRcC+Vwn7ckRNZHJ92v3MzyQW8T4RCCnzjSLke3n9f4MILJaJRgYkJ7Tpt5cwaBRcX028YeBGFGO0Xfg3XUCTh5E6XJSuB45/z15b2RLF/mDfezvgHOOuRWWvS7Kq7UlHICQ07t6bFJ8X28YQO8S93v6azZ9SVNjOTH1TeT4zJM8K3v+c+ZwMa8w0OO7GDrMGMaCVThsDDdccSJEE+rrr4A9y//E+eeELLolxVNUlJCZg15UcopIW7KGQgo2tLCJTSxfW8aMGGPA0XMXwMo/SGumx8xgx2iaUUgoVUiZmGyQd1DpzC4mgEhdSrv3/EkKX65m1Xcd/CJw2yNkLa30ZhoU6FmmBqpOkxpJwcKjnAAEXsozJJkpMp8U0xvTLpiaUDKzssn+M/5G8xFDFlZ5Vfl3G47hhX1CzHHe4mh0lcsQkuop8q6Rl60Qunm8GcQGV2MCPnOV9hRXAmQHK+yU+ri2fKn9MTyqnmrzA2hq/xIdvXjzY8gJSnjfbL55pNUFXgR9zA+PLriG57Asnro5F6cjilO3KERewruNPwLObkLOnx7rsj9PeP8O2Cx3mG2zE15m0TuDlQQ1kbg2fk9xNUf3qM4uP+oXsQhoezfn/VxGi7WcsJihipuoO3u6bo7JRJJ30+UhxYecyRiDn5ocKqq6CGJ5OBq1TEJJHNQ49yB60aBxbk795AEzupxsqRCUdmV91wtsLvz2RdMyfjyxQ5nOIDLtSwc6uRavv4Gh+icnKPjoxvp29r0uE0JBclyhdyjhqMUbb5iv3DtLaeor9fXh/f6stMetDB3MNxYh0Ywsx4zIQh8HDdMWrb/4reWAAJF72xALXtf4X7l/+pYVHetm2c7m651MkIiiFgzYZnzNDbQyl7qOLvOGJyzDTiuDSLqDtXIEfULmB5uZMWC70EicW0jY0EBb14fSoUce1UmBmodoIIZizV4aF8k1G7sGL1W0u7LispQZKV2hM6RH7DP8HkZJJzcBUHeMd3BbGcBRbn1sNLlOtz/i2hE2cNRcKnNMHWacrOGpPH+FJZvTwfJbkcVC4HlqUtUktSFWPHDFZSFbOBmM3qrHMADQ3jeIVRzWtqp7SDlZTShYsYy+jC/dvfJINk8gyZYpQ8GmhKBiXUElXpMF5xK8NPPEksUIwkCMQL/Kxyfd9mQM18fvQRJK9jL57QIQ6M3kwDTUwalJ1P4KFBaNI50hKzc5AUR+FrE4+ZlmN7ifLtgsdnfI3zGUrwwyw4IePMVZf1UMKyyf/ietfLWWPyF4nRzTLiiHSzLLlnRLc9oTnOag8zC9Sqn9kOVrKDGqyc0FF8qoyeFj0ETbkJFJyvwZvGxnFyc+dyXkqU0M0ihtNK/qjnjTIfKjmgmXOr3386eUxFxZQuYdK2z8eB41dSVTWJKMrWiijK1Vr9/SO88bbjtJ5tcJxYB4YwimLl5UkZyS08vPejuszfKD7u3fN/GeqnmkXOFIM1ndFWQjdGhsEoPv7AZYbvaaHdFCcmBBZe4KLYP5yUuNjp28q2m3+mG6dAnM05zxLzX0SXcAkrAq9z/7o/6o5LxTA+ud8yBWbfdTYyA+afNY/ag0ArW1hMvyYzKRX4pxkua2twRSLa+H9ODiPf+R7D21sJuo6nGZmUjKR+ltfZObneYjz68cmZ02dwmaxmokti4da7uH/onoz6B60yXS3chYcx0/dBji7nkkl/q8SaNZMZHH/uo6JiiuZ1vyAo9CbniELYpvSE9qgy6JuHHgUU5/dUMlChzq6n0yUGrVSSr/Ehog0PMPDeBwx2diNdcCHN1Foa/3m5k2z27jGtcnARR5wY45Itf08Vz1nqt4aH8jWOdCxQzET5NTTSYKvCRA+Bhn+U/y8SNesnlmhjEzc3fXIG53egBD9WBF6nnQ2JnudUCMzOkZ1dNUAPpeyMb0xkK2fruEiJwF7Kq36/7rV0e5iRk6uRlaPJZgm0+Z5mDem8Dd5UVEzR3HwKv2+Mmc8Je3u3lyj7qKSbZbbketTzxmwNT309NaBfWSm/vm3beLIa5d13R85bHopzAY4T68AQRlGsTEmdzESgB+MFOv3Ujg7jazZX/ZIVgdeRBIGHCx5HMFlYCxlgxMJB6aOYYn/mEeehiEDP2BLiiPSwjNuGnmLDwS+zfcWrmnG2tI7zreNfZfDtbgbe+4BowwNUHVnLrrHVBMUwZhvCBHnsvnqnvmQx50HycrUOTaZBhFTIn53JxpTQek04AVJeHiNNjwGJMq+JCd0n9k/ewhV3XscF1esJxz9seV2RGHFEGqnn31k+ox6tUbwsWKDkgdWQ2LhgL8LYWMbll+mi8c9yOyJGTqeEnxO0s57nqEoEV+I6bUb1n+iKn9ekTla4edtV/LblCJOBUrpUGZ4GmgyDZI2NHhobPRqJDuW9euFRQ11rBaGQm09dBhdWV/GR8FH2SysQw30s3HpXUvNXeH9IpytbyImkBmGxb4Bd0kZaR6vYTIuBoykRw42Ei8G43zADq8bSpRLjFbcSefN3DLz3AZE3f8fJF35I5b7rucO7V3d+Y91oLSLR9JrVf1/ldYhMZgHlnl3bv8NyTUsX6DRGdvoR5XFZBTHTjUH+8zGcbO1Ivpuby0jjY7pPGQWr1TByctVl/dNs+DOFVfBAoprW8zp4U1ExxdtdU+yqepVCV4SZ2Qz6fdjnkygomLaZdhTcm1zL0yUoUm0fozYPKS/Pcm13cG7CIXZycNpw1cUf0BtLn/UAKHH18ZunX0lrNNXVedi9O0dD9mKkmZeKQleEm9bms7s9BymDDVtkij2s0WmX1tOUlGSpb3Tr6P9TSSisiIACgTi/a2jXSXzYZSfOBLPVlA2KYX771MvJ+2TE3no9L/IsG22y8k6TM82eMCdONa20CZuJSdNGY5AemqhPSN3YO39e7iTbK1+n6shaxHAfksKEk0rAlUIsJEvrfI1K9uvOGQsUE3nzd7P4fg7UREvmpDnmZGUCEu/1a4mz1Gydm4RdGudXLdWl3D8jsicAyeVi+Ok2fI0Pad7vYGXy+XART8tqrYYVmZuiX5zK6rx8+ZROlsgIgUCc0VGIRPS/od83xttdTlndbHG47hgP7/0ovbGlGM1HUZSrL/SEdPMR0+tePsOMk6vZcxVJplXCAYRgkJP33m+pIV9f72FoSOtAW813T+gQ/3D3X/CTyXJm/1vFEUWBWExLLrU551m2bRec4I0KL5XV8/Whb2p6kOX77zENwPk5gTdQaGm7qO2kDlaynnYDm8Gc8GsmGr/KmulgfsJhJ3Ywr6D0xFpp0KlfF5AMncJUpBpt0SgMDaXL3sXJzU1lRLQHtSFrJGuRqj1pZOSaMSmCnDl57725YaS97DKfodFqFwJx3m9tT24W3ywNsWN0LTOTW5D4Aq/w35TRSzARXJi5cSIQ53lWA7BRbGcsptW8XEs7e6gyLCnOZZx8TjJEoW4Oqje+1I1zfPl1OmmkhTUbdURdoGVzdjAz+Oq2ktf+DAIzY+cNimGOvXtB8t9qQ8rsfGoG8YH3PjDVnVRYjxVmWrXzqjBe38Y+m5UGUlpNbiuDTM1WLsPAoRckWlpOUVu7QLMu5uZKNDfPTkrNgdn+lwp77PD2oWRHT1+RnVcYJU+KGksy0c1bVY14n9tly1lIpwebeqy5BExmmemgGOa+pwqzHiQ+F2G23j274kfUdXyKyMRC1L+9lyhPle/nhhdW2Dq3sp/uu3ALX5t4LNnioEjrZPOeOE7s/IbjxDqYd1Ai0Yo49MiCQkNtLzWMorFWUTdzPbDsQTFkzQxdtfSQ5As5HQAAIABJREFUYsSq0cFKVtOB0UY7W9miTBAKuXVGq94AMDcICjmBTxijVypOSPlgemx6GOnCpUcu40yQg5GhVpJY98yckcZkRrYE0SURiwuzchaMYJapczKxs4f6t81UJ1XJFF3bP927Zyezq0h1qe+f1Xrkv/LjHAx/Xjc2L1EWMErEJJilwK4et5156Qkd4pLqmw2ZXJV1JxNHwoF9ZFKJlF3IAWH1XJY1XgXd60afnZ18XJy44EZ4/nlOXHuT6adT7YL71/whrX6ztca8eXA89XWRSXZUvZ6RXvT5Dqv17sVbDvLg0Wvlli36eLD8FVsO7JmA48TObzhOrIN5j1DIzT3V8bSGp9qxS5f5sN7csgPFkDU1dFXZVDMnptrbzs6xtToN10x7jWcLo/LDI0fcmn//4AduXamXnbLtTCAylUFppWyYKRmt1ezDONsQT3Q8GZeZCki2jSYFmW586earg5kjNUCkZDvlPjlz41tpC1gReF0TSFCfzyoT25V3ue375wkd4oqa5QkWbC38/jhjY4Km1Dc3V+4Xe//9zJxIu/PycN0xand/RlMmfSbWnfMNS4p8Z0Ajdhqy4wolKt3rDqGStcJeYnHjPcx+UNLYaXQRk9f0khJO/Oq3hp88XHeMTe1XI6nWfoEp2qpes1yTrYLVLmIs4FRK0GiUz/I6P+ULSTZiH1G+V/Urx4E9T+E4sfMbjk6sg3mPioopdnq/nlZDVM1QmKoRB1q9x4aGcfLE00uIU0yfpayFmozCjHRg2+MxQw1XI0MyVV/3cN2xrH2XVMa+bdvGdf/u7IzS2nqKoEqDbSEns+bAeokSy2D5KaFHI91QYkoEIeDFrDRbSEo9Ve8uT0tsM1OkyrLEAsWOA5slpLJPKhILJfRYfw4Xq/J+oCP+UJ+vkXod47CXKA8XPJ7R/RuvuJU+qdjwvaGIoCevaz6V1NBUS2JlCzdvu4rHW4RZkfQ50EPNYO2/8uNJ4i8FxeKfZnH22QftJVwUMkg3ywAoEXq4TXqeRRegk1ERBIl16yZNZe7sIild02tO1LOl/bMaB1Yeq5st7Z+1PLc5q7FEVfnb7Ci4N7lfFfuHebxV4ED/Z/lz/2hS97OrX3IcWAcOziE4TqyDOYUndIjKqT1JbS8z41O9YR0MX53UgiylS6f3WFExxc5F/6hhCjWWNJgZvES5v+odhlufoTHnQZ2hm5c7qWHOU5yYeIF/mr8x4dSaabiqYaavm01H1g4qKqZ4J1CedB7t0OCnh/yLjOK1iPebyywpkP9trA08hi+t9uGUJPI/77I55BkglU3WcWCzg2jDA0g5ej1BIwdUDQmBnFPDXFC9XiPrpQ44KYzDspyP7Ow93uriS51Nuvt3uO4YH/vQJEVF+RQV5XPZMrcmKLLUxBkoFvpYxf60a0C2YWfdcWAfSrWFGO5DkKQkg/WiW25i8cUFLC5aRFOszkIj1gwSxb4BsqUhO8hianiKTcIz9Eoy78DQkIuJCXC55LU4EIjT0nKKbdvG0zIHg0025aA5E7wZ4Z/Z60qw4NHwbTrNaJAoL4/xyAtBvtTZxLH+Qt7rjzp6nw4cnCdwnFgHcwq1JEsNT9JHMakbtppOPRRysz6ho6hoQa6nXaf3uPr9p+lmGc+zmnyiif4frfREDqcSMid2IGuXBsUwzVW/5OZtVzFecSs3br9aF/FVkzqpIZwaS1IVuSIRjUyHFcz0dR/e+1GbY7dGJlletZGfjgbfHqbVZM1K7XwMJwMSal1QkMtHS+niNvaZXiGOizY2JTNPZgbhyFR2ssoO5g7jFbciLVwITM8FFzEaaGIt7RRyArPgRkwSgWlZr1DIrc2aA6vEQ3RLJUwGSvhdQ7tpkGlL+18zGPejzOVINI+7a6Yd2QeW/8wwq9sk3ZusIHFw9sKsOij36KsIsRgC8HM+a8Deau4AeoVRdlW9Ss+pD1lUmhjBfI0DgTaxRic3BQLxuEBeHixfPkVjo4clRfk8cucg68ZakrJwoihp/qtk8f0MGF6tkAF5v2hsNHw/XfWLOqOtyF1dUL0eT/gdVvM8eVKUQgaSQabW1lO88IK1XrcDBw7OXTg9sQ7mFEoP2hd5iZ9wLamEQr7cSb7THKOiYkomJan5CoOSPgNYyAl6q+qJbnsCMCdTUTZ3H3IfRJSFidet+35K6KaLZQz0nzQ9xooQZTbkPma9VAJx3uvPjACqrs7D3r05xGLgcoFbiDERk415BV6iSUfdCBoJEnYxSqpBBNnSLYTp/uNU2CXyEZliPHBJ8nc2lxWS6O9Pzwrt9NHMLyiyTkbESW1sBKCeJnrT9MkGAnEaGsZpapiiL5KPSIwYrmQP4aq8HxiWEV+1ZMiw3xWg2D/MG29Pr0ep7MSVHMgaS7UzL88cjMj7FGhlt8zmn6ytuoBxIhRSLIR58OqXWfer2qTkSOr8lntcjcnsRvCZst+nX5u17+cyznNUsUo4yNi6Dck9Vo0XbznIHUdv07SX5HCK53KquWn751m0eUNybqqZsmW+AvPfpIQeGnMeZGz1Wu4+eI2hTJSXKDsK7uVLnU0W38mBAz2cNXN+w+mJdTDvEV8aoIOVBg4sgMCpCRdrfnk3vrqtLKzZyKDkNzzPIIvJO9iRjNqOL7+OepoMHBw5UxJlIVEWMZ0JBNmQOKkr+TIqX01FKORm69YFhMMuJEmb3YHpUudUmL2uhlkvVaY9VnV1HtrbcxJaeHLkfSLmJvV3H8VH3e5PmJ5HKY39+yovn+U1jITMPe7slW8H6dVk2UrpooYnWcteG0y0EpuEXZrex9SMfLrXFYRCbq680ofLhaYE1UF2oZQLvlK0OVEhkG/5e8eXBmgweNZH8dFAE5UcoCfRB2iFcFhga62bvogsESGTjMnVHpvYxf6xr+iypp7QIfokc8bZcEQek+t4ONmvq+7lVsZ/PiMUcnPZMneyFPtjH5qc81aJ2cLoHnawksX0s5qOhENp5TgKFIonOSEsYTJQym9bjrDmnYeT2V2ltF1dkbKZFsPsfqPwDSIG7NMKRH08UDcWNSbwUEszgiSRt/tZw+qhG15Ywc7y5wnSI1cs0cOusu+wcslPWFizEUpL8YQOafZJufrG+jfpoZRNk09Tt/sTpjrHo/j42tA3030pBw4cnAdwnFgHc4powwPUC49itsHHEMlrf0bWgkxTJaCQO3lCh8g72EEf5n04xtcTKGSIdtablq+aobHRo9tkx8YEGho8XHmlD1Ga0vTvKrBjwN6/5g+Gxsr9a/6Q9rNq7N2bg93saETyWzppntAh8nY/y7/zRYNzCkxMCTqykJnAS5TreZFN7NKUkLeyxYLJOFHyxhSbc5/l0ZZcTfbs9rKfYeR4y69Pfz81ScvhumOqIAW6IIWD7EDpLZSrKNoSfeD6oJAa0YYH6DV51nsoSTCV2qtYGJvQ99fCtEOcGnTyNT5kWVZfnHjP7DmXBEFHLnUuwoz0KBRyc2eNJyGzJgfXBuN+NrZfQ1FRPhcX5XHfLdloWzi9SCXvq+FJbmOfDed1Gn3xpZqe+dS5pg6CdLGMp3Lvoc21WbtX5Wzhxpa/Ne3BFgSJNWsm7fWxqqBIMgmSZFr+fsMLKzjW7+e9/ii/bf031vQ2JXuE6elh4da7aGqYMnVGzTCKj4hJ8FpBhMXOWuzAgQPHiXUwt7Bi7gRwEWcZXYjEyMe85K4w0ZPjCvexsPp2hLGxGfVsKsawUcbECmr2ZDUiEdkAl1QZHcWRlfLybBmwN2+7iuaqXxIUw7q+3EwQyyg5KtDYqO3fUhuiC++8g/3SCtM+VgkXO/6fZ1Q9iZk6tFIygPBjbjDJqJujhB52e2v4VviruvLPxtc+zcaynyIyheLsbiz7KY2vfTr5PVNJWu5t/0vDIEXqb+RgdlB6C40yq2a/93jFrQT8Zk6q3Gs9jpf0zoT1+70Edc6o63iYRurJRc+GnsMpvl3wOCA7OR05a1MI6VYxtm7DvCH5UioNliyxznxnCjPSI0/oEA0NHmKS0RoiO7Qx3Dx79PJ578iqe6m/yEu0siVjOZ1Utl2poABAV4XSIVQy3PoMg+EBbnz6Gv4YKCcmuPljoJwbt1/NeMWthoRMCuPwtm3jPPHEKWZKFmWnesisR1ipTMgc6Z9dZy124MCB0xPrYM5hrusqJbRDjbMj6uNkp0elf8dK7mBHouc1s8ivl1HauF3nvFr1r2aiTRukm3cC5RqB8LnAxRfnJ0qJ7WK6R9RI6zSfDxIl2UaIIxInhohIDJEJJgx7Z+XrpPbktrGRVRyQSbBMtHjtjF9BeXnMNuFHav+ilxHTeaTWA3Yweyi9heb3XCIQkHR950qZYqZZnkwQpJtG1zdpiH+LPoIU00cT9yXXm+m+Rzmo1uzayo1PX8N+VlFf79HpLOflThqSwCk9567jYeJLA7bXiUz6uw7XHePhvR+lL/Zhil3HuV58mecmb9MwwuaKMZ5ec5SqI2s5GL6aetej9MWX4mcQSXAxJPlZGkivY2vFB+AO92BvfY7Tn2H//5nA3wT7eevUJWS65+TkSGzfPi1z5AkdYuHd1eyfvMWQ18HHMK0FDdzc9EnTuWHF0QCZ7VmFnGCAIkC/Dx6uO0bd7k8ks6V+3ym2R2+nkv2685TQRa+B9nI24KzFDjKF0xM7vzGTnljHiXUw5zA2QCU8jCUyKPbhJcpa2nmG29NomFqTWwSFXnqkkumj8/IstSHNvoOh44PEezYIhLINpSfWroElihLvviuPM9UQ7WAlq+kwOZfybKb7LSQKGeBWvs+PuYFeghTTSxP1rCx4GQSZxbmULnpmbfhIth3ZV4o2s4k2G/2208Q9DrIDZZ6Z3XNBkJAkIeXfEAhILF8+lSAtOx2OrEQ1T7OHKkPyKHXAS70zdeRWsYmdpmXKgUCcN9+cds4U50WYnJQ/z0rqeYQ+gjqHMdXZFR99hP+fvXePb6s6836/W7LsyLKTWDYKaWTZoR1My/R956Rpz1yKWw7lMi2XTp2TkxsJBnKxyaUT+tZglwIDNqSnpG+uDgngEuIkzUTnnVCmbwvllDF0LjTNaenA4JcO8UVpGieWIbYs27qs88fWlnXZWxdfYide389Hn8SS9tba0t5rr2et5/n9zt9yZ9oj0Sy7kgXvkr+3Yi6wg80pBdSsVpHSZ1ZP9Eg9rvRiW7Hty0R0bSppuOFXHGi/kcwDWPU7KSoSNDXFB5jprgNQV/pbLBuiq6/Z4nbnsGXLLEZGUrfXwhAt3KuKkMXcB93uHP7bJk3VPX4fmhhU4kTwS0UbWT+0c1ImmxKvJYkkHTKInd5IYSfJZUFVVZDt24dUCxQELjo4xEpGUgah+gxiYz8bUgSwAgiTbqDRLUp5qWgjZZE0rjLrOQ6zIotj6IymOCdSaj6T2cFMMNu2DVNdHYj6AaZL841NP9ZSyLTUNuMAViO5TjYRG/1cwMFeNtFIPS666MZFvWkbR/puRfF6ERh5fibXtKZGoa0traIJAPXmbRkFsCC41vtvGdkkSTJDqy3U+80TA1gg8reasn/0qCXLlPnsaOYBQ/GouHbGPL4z8l3DABbgjCf+eArqvx0XwK7jAF2UReuCH6wJ89OKelXobnNNXIou1dXRc/FE3Uk+PW9k1LfWMRIVS9Kz7DK6lnsp1k3tjsXvV6ivN07lbLXenyTKph5XueHnXo680F5JpsdjYYiDRZvp6RmgvT3Zp1frb41qvQECzKIh8NiYLZqqqoLs2DGE3R4m/n4gyGUwKtD0gm0zK5SjhJyl9G/fBcBPK+p5oCaXgaBWyxzPCHkRrYtRhNVK1deHWKO8GLG2m7iFiFgbPolEMnORQaxkSqiqCnLqlI9zPQPROtSx+pCGdOxY4kl/mttNfawf2klXREyo21uYVshHO4agvYQmHo48G3+jzsdHU6gu7edPJuqiSOxQWx+7fbTtoqgoOqjunKDBp2ZvFLtfgYmucKmqBstyFGBFRJlT8+It5jzmjP1949EG9Q5HAQsdgp9W1CcFod2hBRnuTeH/5f/gRP1vdF+drBrDKxmttnCZ8y32sy5SB656QKZLEvL7FUzKZGXoGF8rqQKNzpTicskTWkqfN/p/I8XlR/oexNryHIcDS+JrJQNLKKj/9qhvrSiOtttLMbUtf8GJupN0hz6Rsk3ZHJ9GX5+ie3673TmsH/zvcaJs+6jNcJJolMsh1E1/31EDxGLO02LZwF1Nf2b4Tq32Ot09sAtXRjWqRlRVBXn/fR89PQP09AzwcfMLBJ1lDCkFqkpy88+59fQzUcEpgMKtm6jta9S1PYulW5SqfsuKAmVl+Jet5H8cCnBw8P+MbDtRv6pglesXKVPaJRLJzECmE0suOba6rVgPtqhLf2YzYpYVk28gYx/QZFKlCqf3L83HxywG8er47GlekprPneYl6TL/gcf+6n+y+sMn2OT5NvuoTajpE5HUvC0sc76V1ht2MnC7c6itnZW0mmWEVqeV8/a/8kTLNZHBbLptRcS/MJNVT4HAZJgyV0YHHRFrlNg6rGxquTJpA6iDxSYeZpnzLRYOvhuxWckMFx2c7In3Lna7c9i6JSduFU4hzNqrT/DkO1+ZmKbPIDI/d42vbxMhhGKKmcSZGGLP01haWc7dHDKs51ZTkddxS8++6HMljtnRlhnVBSuEeYlVhr649ebv0RXSV0N2mSMrfAavJ1LMeQrwZZTOn5jO6XbnsHHjrAlI7xZUV6uCRNOZ+Q5rSsX0Gvawh00Iu52Bxu+lTAHWNAgO+7+e8hwqo4P/dFZesvuJluasZJDNFHs+XHVVIaFSF5/0tE1AaUgyLjo52ZNawVgiSUSmE09vZE2sZNpjq9uq2ufEPCcAJScHgkFaWc4aDhoMDtIHpNliIsRB7k4xcBAoCrqDaa0eNzmAVSmjg9PWz6SsrZ1MxhL8FRWFGf5oiEGRvjZZIcwG9gJq6mUmAW8YE+YUIj7FXMBLMaV08XDzPKqqgsybV5BxIJ4NWhCAJZe1yoGUaaCJ7VQU4sRTPrdwmG5fslejQpiWiia++uaWiW38Fc7ETFyIiFZxOEWwkR16NbEaxvWMoxNa4aJivqnswOtVz2c7F9jJFlZyxHB7VexOfyXLRQfduAyDHoUw+6v/KakmNpdhgiiEyY0+p9VCAhlOJo5eBzffHKS11ZK23jLVvgDMJsHqNcFpH8CCUU2suvpaQzO7nNuyEvPTap43eepoppbE3zuTmth04k7ZotU3pw9iBc3No3XSV11ViDCZMIvgGET60t/nFcKcuwyEvyTTCxnETm9kECuZ9pTML+JwaGlUCdalKQwrRwktcGI640lx45v4ILaY85y3llFmPWewGpf6M9UBpv4AWSHMR80tU2apMbbgL5PvWFBGZ1QZGlSfxP1siEmx0xONURUvMxVu0gRk1FXwTAZCsX1EZsetraq1spx68/ciqcUi44GXxSLYu+oN1rV8yXAbvZVbSWomfuJivH1H8jmfiLHCspqp0GpaxZpwS1J/kTp4TN1uhTBF9OpmkQC4TN2c/OPcJHXixllPoAwORESkSkf74cixaUJM3bhQEGlTScf7/drtYd5///ILShpu+BUvtFdGVdnvrWiLWneNB7c7h4aGvLjJjv9e9Hdp1YkTs0FAYLcLGhvHFsxqK7EmgimybQQ3Xv0eP3pnNA09s5VYtb9WJ4kjn2cX3HVXkGOHwTdsLEoo+1TJWJBB7PRGBrGSac+rjvWGaXG39DwLwOL5Hxukv411oGS8nRZoHmZFktqwnrBMNvue6hvt2FazUn/HVqtgxeff47W3ZtMVXhCx1TFF7Y7u4zkDhWnBIVZG7UkyTxvPPDDV0tmyOW6FcNwAXVitHHTVs679v8XZj6QilyHm88cUg7XLwy5kOjGxKeQa2mpSdn2INTfAs7atrOrbDWZztAyCUAhhK0DxqSq65QZ2ItpESTE9hsFm7GSKNsGX2QqyutKsH2QKNpj383dnjQXqQN9OK1GVdjLtjNKpHUsy43PXYVgWMdbvOM99jMLatTwgdupk26irzmsr3kgK3K+6qpCL+57nx5vfZF1gT9LETCaB9X/9FJy9WEDi9aoQ5GDuWm717MzqWCQSGcROb6Q6sWTaU89TBoqfT0X/fmT1Bxmo02aK4CZexYz+zXKBUxWXiVMbVjITlgEMBYcUwjxZ9MwY2zwxNDQMY7VmruprtYo4cafE7Yq5wIrPv8fhX32GrnAp6tpTDkQEXO6lhWGshvvXVnlWRoSbMvtNYwWpUrddU6tsaBjGbM7sfEkUUlH8flb59rOv8iAuOlH9b4OkUnYeIY9OygxfNxPOqC2SURoahrGYE6/ZVL9pegVuBbiv8r20v2fi/j7/5wq3tTdxoeciF872xf3be/oP6t89F3m4eR75ymDcXvLx0Ug9AF6S0801NDGlFRzhw/w/ZaB6PeGMbs9KilVShYOhVVGVYiM0cS1NlEdTpdUC2MbGPPx+ItdUuu8tG9R+VgawE0O3t8DwtXSK0kYMVy3Ff8997FE2U8Oe6LVjJsjail/Q0+MzXHkerlrKHTtvYF/RQ1GRvlJ7P83NQ7z/frJCcyK//T3cePV7xF6LeQzyknIPd+74YtbHIpFIrjxkECu5pHQbKF/GKmLetW0xO6rfRokONmEsKyiauMbPuY0XWZ0UGOfj42nP3dgXXU+e+9ioYvK5AU6d8uF0ph6s5eNjHfuSrUEIs0F5NqUa5aVAC8xVxdcwZXTEBPSq9ZCiqIMDs1mwbFmAxsbhyHOJKBQwwGttBYYrMurKZWa/0UqO4MraeshYvCt2IFxVFWT37iFstlgriWRiA4xYTGc83H58GSd77PT0+AiKHD5ufiFNu4zPz5DsZrMm5+1/RcnKP0dNAQ86ywzPq1LTGQ786vMEsSAwUUZniv2N/qZtbWbq6vQDgDz3MeyLrqdk3hzWNlawX9xPWWTAXkaHYf1sItpkigIowQDBL/w5TrvR6n3mQeQgNp744SfT2kINVy3Fe+rdqCrtcNVS6uryqKmZFVkRV6KCTaYJOZ0FB4s2c+pU+mBGkhnpFNyNFKXT4du2nf69B9jl/B4BJZegswxv84sZpU0PVy3ltvYmTvYUc67Hx6/fJ6vf+0fvuKIKymHFjM95HXfsvXHKSnQkEsn0Qo6uJBNOKquRBQaBYb5NYdEiGw5HAfMdVta2fClSg5Np4JpsbXOIlexlEzC6+qcNMF3RAeZhzJ5uCrduShroGa9khnGZPeyuPMwu5/cidjCd0f0etG3g6b250+JGW1UV5He7f0bIWkAHC/k5txHEwiHTaiwEor6boZDCoUNqLZXRCnQnrozsN/QwJaxGCquVR1Z/YBAwZ0epfSBpYFRVFeT0adVKorlZXWHXVhBIE2Bodhex2BofN/QBTofL/IcxbTeTeeLFT+mkcxuvxtsYoJzTWDwdDMwqxsJQ3Ov5+Gi0PhGXMqvvR6yHwsGDyaJfJ+pO8tnam7F4OlgoPuSo54to/ZUAPDhZxaGoV6oxgk/RHrXPWTjSzstbfkmjSG5fPoMG+zCmWzgNvUWN+uq6ujxaWvRqEhXC4eTfITdXRPyoM6OY3imf5LvSSD9ZptDYmP1qLOhPclwqpvKzJRLJ9EbWxEomFL36qdh6HLc7h82bZxEIJA6Oxi4Mko+P5ZUdvPZmId3CmSRSkvgpmEwo4eQUz1hbl9jjmUi1x6lCU740nfEg5hZxVd/7uvV5RUVhFAW83okU1hLcV/kez37415jOeAgvcEZVO+vq8vjhDy0Zi/iodj6jbbPmBti+I5jxbxL3PRQVofT3owQCoy2NqQXU0JQ2D4tlrKKV7L4DdSW4uvryUFydDuS5jzG3ptpQJEkl8bX4czPXHIJQIBoImxTBetHMXjZG39PKcrawg15KSP+bCnp6BqJ/ud05PFgrElS8w+QQIkhywJt43qZ7PaqcDdEaWafdx1PeDaziUAbtHaWMDk4r13Dh3Mdxzxv11cuWBTK8JtV7stM5msr/VM25SLaNMBQCMhNgX/Vb3LVtccbHIElPqppYDUURnDs3kPI9E4GsPZRMR+R5Ob2Rwk6SKcdIkCXWQ+6662wGQVL2mAnyonIPt5xrVkUoHliHEg7HCaRoQe0KjtDf/ByFtWtRdM57oShJA70rEfui68nxdKI/EI5N386OXIYRCALMittfZWWI48f9htvFThQUFQk++khb7Une/92Vv+dnH356wiYVYoPa2OA6Fk1p0+zp5iv8lNe5hbGktl8O3pfTAfui6w1VTcvoYAAbvQYCSfEkTrqo5QV72ZS1J7XZLDh7doC6ujwOHrSgZjpPjtCRRqIfbchZCoDFczqFUmw8WjC8wnyM/t3Pxp3bRn212Swy9npVEAhGA9m1jRURNVsjpWZ11XbHDlkHO9G43TnU1Mwi1XmZ6O07WchgQTIdkefl9EYGsZIpx9gaY3QlY56jADEBA0BtgPY31TZ827YDqq/dYbFMVwF5d+Vhbj++LGobkIjeSuyVSMm8OZhEiIkZhKsrjZo6MZZcHi7chaevYMxBptudQ319Hn19o+0rNvXx9JrfTsnqjaa0Wbi5BiUQyMhOSB+pUpwJqa7h/axN4emcHjNBglgytnlSUTMJAJ5v+wyTHbxqJClnKwr9ew8wp+beNG1Q/XETM1ISswxS9dVjOUarVbBz2Rvcd/RrXOXvTDnRcKmCqZmGw5Gs5jtKvJfrZCKDBcl0RJ6X0xupTiyZchYsMFAGVdQaMvui6ylNKaiSCoFCcLSmVVkfF8CCWs/YQJOuAvJ3P7wPAF/DowhrvIqusFrxNTw6xnaNjVhRGE1c6lIQXuAcc31nIi6lm5dsGxDA3RyiPvAYT3nXEywq4d2GljENmKqqgrS3q/Ws2uM//miZ0vTD4aql9O9sJlxkZw+bCGAhZC/hQPUb2JVe0injqlya4OdyJ7zAmVTDrtUwrzC/CZ45AAAgAElEQVQfozRBUTobQpgRipKitjte5dhMkBr2cOBf/jda2q7jUv6GicrZWq12agEfdbU5jJkOFsaVVCh+f1xtrFFfPdZj9PsVHn/tS/zw8zvow57yvWfOyGvhUlNs6pOr3xKJ5IpCBrGSCaWhYRhFZyAvhMJDLX/KPM+v6UphR5KIgirI43SGaW4e4lyPn3M9Pk72FHPLuea4ABbUANVogKoNnIarluJfthJhNqtDVbMZ/7KVl1QwQvNmNHu6UYTgqOeLfLb2ZuY5ksWwJhpfw6PssPw3chlfams+Pm699ves9/2ALsoRmOiinHUc4Ij3Vgq31I4rMJ+95E5KHLOjj9lL7hxXe8fLcNVSets7opYqve93cNe2xZze+w+ErTYEprQKoZL0aJNMKzlCBwt5iVUIxcTdtOKaP8JN1Z9grDYvZjP07z1AqaHYlip0ls8gh1hJEAt72cThwJIMLW8mBoUwnbgo5zStLEdYrQzffCuFWzelEPCJT5fWhKK0fbSynGs8bVERp2uu0dS7MyGz93k8ChvbVqSw/VExDqAlk4Pg6TW/nepGSCQSyYQi04klE0qe+1iKdLds0tQELjp5rPI1bj++LKs2GAlcaClsWgAZq1SqJ+gzmcSmNOvV52UrWJQtee5j1D1oZt/garKfy1I9Y//vyn/gsV/+NV2hZDXfYs5TgI8uXCxwknVa8ewld5Lb9kZCRSOMVH6Zi8dfzrK94yOTFCSttnaT59s08wBG53+sOJDEGO37POq5gXXKgTgBJatVYDKBz5d9XXJ+vsDvV5g7V+DzKYyMGO+jjA6+yisxqePZCnqNdbUxftt8BtlR/W9Uv7YGs6fbMBVaq6HV608sDKGgxCk+K4pII940mpbcRWlGdbhmghHv6NTHJ+vDJwfjdOJL2/fItE3JdESel9MbWRMrmVLq6vI42GIaw4BPD8HHzS+MKahMp5A8HWpiS+bNiYpLldCjWz9WarvAr0+PzRIhE5zzrYyE0q/4mggxN9dH30gBpXTxZNEz3NX0ZwxXLWWew2ZQnxg/EI/9/mMxElYqccw2nAa50HMxq+McL9ne+K6dF+YjMZtEUaHrrvbS9k7uhLfvSsZIfKioKMzQkGLoWRyPiHqbxgqGWSyC3FwRCYYnVuRsonE6w3SdyUERQjdIjQo4cYSFWdX7pkJQRidf5ZUUEzOjWCwCVeg7/fdlt4d5/31ZEzvROBw29CclL209vgwWJNMReV5Ob2RNrGTKeHhJFy0tlsgs/MQM+sa6KlpVFWT7dtUbVFHUVOTYAMp0xqO7ndHzk4FW39bK8ojFRzLdPvuk1smOhDJROBXsaR7hfY8STeO+rb0p+tukTsscxe9XeGpjb1z9b577GD+u/QWf9LRhFkE+6Wnjx7W/uGS1wZPF/zpn4rqrvcTWVsoAdmwY1U5+9JHC9u1DpE9zFRwyr2HB3IEkxetAQEkRwIKWWjx24tumlkaEI17F2XHmjBLtM/Q8r/cVPcQdzTcBjNnLORmFTsrTBLDq+a1Zc2X6fXm9yqSWTMxU1l19guRrQkSel0gkkisLGcRKxs0jn/rxhKt22jJzvjCkqirIqVM+zp0b4NQpX9wKoDYYTMTo+clAq/u7l+dINYiOFWKZCpxOkTIN+JHVH5BP4gy/fmDRHfoEihCYPd0Ubt3EyxvbWCeepTNST9tJOevEs7y85a0JPIKpoe2d3Dhhqss5gH1lyVEWO7zMc9hY7PDyypKjl+yzjWonFyxQz0unM32GzqrQi3R7CwxenbggNZFiLuCKEaZ6iVUIzLzIap1rJvU+i4oECwffjda4AnSwkJC1gN81v85t7U0ce/sayjk9Icrv8aTan4LTKbDZSJmarbddY+PkZZnMVJ585yusu/ofIhMlqjDZuqv/gSff+cpUN00ikUgmHBnESsaMrW4rjzpe5NmLy5nIwaDZLPj+94fG1bZUTAd14uGqpfRv38UI1pTvu5Srw4lYrapFTiru2raY3ZWH4wbrRsrHsWqrit9PQ+hxXRXphpFHGan8ss56gloTK7l0vLLkKBvbVtBFWUS4q4yNbSsuWSDb0DCM1Rp/JsSel+q/6RShs11RzaR0RST8m/ipYXawhQ7TJ5OUgmNXUkdXZtVrp8b0LNbcQNy+cnMF/f0K3d7C0ckeDtBavClax3+i7iRbWr5AF+W6x2phKAMht7GV7Jw5o4xJbVgqFE8OT77zFc72+OnpGeBsj18GsBKJ5IpFBrGSMWGr24q15Tn2s4F0noWpURU1o6lxZg+7d0+ul50WQIacpQhFIeQsvaSiTrHtSMdkrg5/qcKDXuqZpgatV8Oqx+3Hl/G75tcJOMs5zUJ2sCVppSkfn+ojG4NR2mM3Li4efzkayGqPqRB1muk81naz7kTDY203X5LPT1caUFUVJD9/KvQSUgfHAljBEZRwWPd1TXk5jJkAFsKYOW3+FN9f82u27wjGHa/NJggE4j9nEBsNBTujfcgTB/8k6XfSWmLnPLPpZwRLdIVuIlmwQKRQGza2nZIKxRKJRCIZDzKIlYwJ68EWFIiIOKXDeLBSRid72TSaGrf7Z5fEy264aineU+9y4dzHeE+9e8kD2PQIPsM7k7o6/Pdvzo0JZNXHlyo89PQkp2CnQ/s+w85SQ4/PWM/KVpYbTn2UoopuXTz+ctTO5kLPRRnAXmLy3MfoTjHRcKlql1OVBgA888wwFstEBkQK4w30zIQNpaK0R9hWABbLaDgcCmE92soKDnPqlI+P9r5Ap+8q+vr0P6MrktiQ5z5Gd+gTBi0RDJEfEY4zESIHC8NJq7Jqve5YENx8c5DBQe3oRsnHx8GizRyofiPlanosbncOixbZojZAsm5WIrm0TJV/vUQyFmQQKxkbIdUPM7UvpqC5eYjm5iGKipI9Ca25AZ4oegahKFBWNiWroVNNrjJi8EqYf+fPJv37+Ps358bVbv79m3PHtb9Ej88wZhqV71BPU5xnZQNNhqrGXbi47jo5gJ1qbI2Px6WAx+Kii8Ktm6bFAKeqKsjOnUOYzdMnkNUm9xL9Wg9HJm8UQPENoATiU4cVv5/CmvspdpZQWHM/R/puNZzsMZlgnqOAz9bejJ1e3feYCSet0AaYRSEXo+nMICLXYvYZNTazn6NHLXi9sdurQk/PNJu4rb2Ju7YtZtmyQOT3UR9+PzQ25sVd45qqvMdjQggFj8fE1q2zZD8gkVwi8tzHeKh2hDzPh5hEiDzPh+rf06Cfl0j0kEGsZGyY1UHaOvahP8BRV/WqqlSv0/Z2H83NCWmBO4Lc1t7EhXMfQ0fHjAtgAXbsDUOSWmmQQ9x9WdZ/JqZqH8pfyzrxLF2x4k0coNNQQVUd4nu9JrZskQPYqcR0xkMj9Yap4YrfP+XCYxpVVUEMMnd1yTWHIoq6xumu4xWqq2VX5FwfPffv5hBKTECrERvsLuQ0R0aqUCDlZE8opDq5dgkXFylMWl3Nx0fI4BbfSwlduDBl6GdbwEDSeWA1D5M3Z5aOzZGCzUZ0xdztzuHoUQuhUGwKthqkPlgT5qcV9eS5j9HYmJe0L79fCkBJJJeKui2CZrEh6jIRIodmsYFvbFw41U1LS8MNv2K+w4rDUcB8h5WGG3411U2SXAJkECsZE/7V1QhgL5uoYQ/ajL46rApzX+V7Sat66dICZyJVVUGamwO48v4Ytcs4xGqWVJ69bNNnY1O1G4a/q1tTmcnAeWREDmCnkvACJys5whpaUAihXd+mmOyLqRQeSyR9jaWIprc/N/ubtLerfVEmCsfZo9DMA0nnvhqQxogzsTzq+9qZMNHTyvIUdjkJdkExq6taP7Kj+m1K7UYqyAoCE+GMykHApxTwTLMpfhJyd5i+Pv3rOFa0SS841RjExiN9D1JYcz9Gp5IUgJJILg3PjtxL8r1Z4fXQjdN6QnmZ6x0OtN8YF3wfaL9RBrIzAEWIaS2uUA6c7u0dIBye1u2ckdjqtmI92KKmFpvN+FdX49u2fUz7kibUVybzHDbDlaTMVroEPT0DE9yq7Jip52ae+xg/3vhPVIf2E2BW3Gu5DPMC1awwH6N/97PTIovC7c5h8+ZZSSJIGmV00IG6oiAUhf69B7A1Ps5Rzw2s4iWmYk63mPMU4KOT8qTXyuhgAFuknjU9CuFoUBpyluI99a6aorslB/+IZVztdDrDnDo1GhCfqDvJEwf/hK7QAvSvY4HTqda91tbOQgjja11rdzmndb+HUns/p/Ovx3TGQ3iBM6oTYGt8HNMZD6KoCAQoH/VFX58O5+NMZqb2mZc7DkcBRvflXIbw9AR0X5tKvvNffs7+P34d/XaH6ekZ7bfkeTm9MZkUiosLABYCHRltM5kNklzZ+LZt58LZPlV452zfmANYyZVLqfkPus+nrqWOZzrPAF9p5LmP8bOFD7LY0cvcmmrWhF5ICmABRsijgSaUUGja1Mau4DDPF2zGxkX0RIZi1bFbc9bw2ZqbsHg62EAzmUyomBhJ2m8q9d1M6KXEMLXeeBVWH61+OdYurKoqyPYdQUrt/RHxpuzbqijxIkxRO5+Qk1Qe1x6PiQdrBfbc1JNQWruNUtev9f5LUo1e4QPrMHu6UYTA5PVypO9WrhLnyPF0MqfmXhyOAq525PPwEv2abolEkh0jTG1WVKLg1Owld/Ko42CKAJYUz0uuFGQQK5FIJo1HVn+gOzC9p7KdXFMms7oypfhSkec+xjdqSrnbN1rDrKZn6aMFX9OhNjbPfYzCrZsw9fVSghcQcf6rserYraZVrA/sjh7jALMxHuyoKcjFnCcnKXtA+3s8AyUFs4EysIsuvBQbtiuWfGWQRhp07cKqqoKczr+eMGbK6MyydWHuuScQV/phbOeTzKDIRwwPJfUB0XbHTC7oqZr/BW/xOjcn1eiZwkEUwigRn927ORhZsR6tuQ1j5vm2z8hAViLJmOmZ8aj179rEldnTzfG2+eyjBhmozmxkECuRSCaNu7YtZkf127jMnqgP8I7qt3nquIsde4KYMrD2kDVxk4vbnUNFhY05NffyOjeT6aBAQRUkgqmvjbU1Ps5h/9ejtaWanUw+fhqpZ3muO+oJXU9TxkFYsamPMGYK8OmsRGR7XuoPEEOYkoI8hTCduAyvDxv9MZ6vgkFh5SHnSxxoaNdNpT3quSGSrutKstPJzVW9ZLVgvTjy/zI6eIm72bYtXjDK2M5Hnz6Ko8EpkaAzdnIBiIpaNdBEI/WEMdPBQt7gJvRq9BKDVWE42aLww7aKrNorkcxU7qv8D1IFslNlgWVrfBzF7497zlj0bhRlmgblkolDBrESiWRSuWvbYk6encO5Hh8nz87hrm2LAXWFaE/zMKQJZNML9kiyIc99jOWOf8XhKMDhKKCmZhZ9fZpFSuaBmcBEA02AKgKV6vN+WlHPYkcv8xw2PnfdxKeIm854aNAJTgexqWnPIyOI/Hx8DY/SHV6Q4V4FT6/5LcJiyTq1NxGrVWAzjJsVBsknVhxPE4BSVyDjz38LQwTIi65Oxqr91tTM4mpHPg6HjcXzP+ZE3Unc7hzWKaPBvbpv9XOKOc++P3+OC8o8wpi5gIMLOKJB5DLnm0mtNSoRMMJFV9RyS10ftxDGzOlIfbKRqBVk6kOemonYh0QyE3j2w7+OiPglYyaYZIFVUzMrqgb8qOMgtrqt426DXqCsTZLGqrh3UpZmT4J7qqV46JWOFHaSTAtkwf3Mxe3OYWNNTkzq6mggZbUKtm8fmlIl6yvp3LTVbeXOliW8zi1MRBqWQpiQtcDQ4znPfYwfb36TdYE9cQGmNTfA9h3BOBuWxsY8zpxRcBYN8LUhNz8Z/DLduChVPDSJh1gRSQfWEHY7A43f4zAr+E7tIL3CrntMsYJHQlEoFx/SpSMglIwgZCtE8flYaCA6ZLSdjX5mmYL0hoswm1Xtu6Iigc+nMDKS/fduJkgYE3Z66cOesapwLsMEFQthYTRfLSjmAjvYEk23Hn0F+puf4zAraGoI4vHaKKWLr1p+zsHA8oxWs/PxxaVya4ScpZjOeFgoPjT4XgVldNJNacbHaoSZIGd7/OnfKJkwrqQ+cyZRMm8OZhEcoxijIBc/+ypbuf34sqw+V+v/PR4FRSFOCM5qFTw7azP/0lfBPmrTrr5qE3TV1cGkLBJ5Xk5vpLCTRCK57KiqCnK2Z4ienoFkL+EpDmCvJPLcx7D+8PkJC2BBXZXr374LIE50QxN6sjU+TkPgsaSAxz9iidY6u905cTP83d5C9g2uidasdgkX6zjAYZajAIdZzkJOk+M9z8KaO9lUm0uvKDY8Jk04CEARgq/xSlJKrR5mQph8Pg6znAFs6Is6xf9dzHkOsZJ+69U8vea3WK1E/VH7+kyEQqTxptUnhIkNNOOlOKugboS8FAEsgEIvV3E3h6hlV9Krx96+hq1bcuj2Fqq/BeUcDCxnDS/iMnWjEMau9FKQ42dU5Gr0MUg+W9gRXVmFUeGp8AJnSguhTsoJR1eNx4rgnsr2cWwvkcwcwguccf1lPOnuGQoj5LOhbXVGmTbaiqvDUUBtrdr/g5KkZO73K9x/8Qc080AGASzYzEP09PiSAljJlYkMYiUSybRBeglPHrbGx1EmNPNGcPNffQxA4dZNHPV8kYXiQyyeDj5bezMn6k5iOuMxDFS0Wmd9H9H4v7W04ERPVS9XERTGQV2SKjHLeZHqhMGQca2q9nmjokHq+4s5z02mX0TrUs0EqWEP53GwvOhn9G/fxeOvfSnpuEIhbZCW3SSCoijsUzJZhRgbAhPNPBAXyCqoIk6J9jyD2PgJX+XDT/wV53p8NO7NZ+7VsfXC8TWrvVzFvbTQyvI44amD1zySQc2atp/0HsCJDxMh7qt8j6eOjy8VXCKZKfgaHqXR8pihEFsmjJCXVowxduJSL3BN2mfITKZ9pi+UrKYvuXKR6cSSaYFM85BMV66Uc7Nk3hwUISKrkBOzElvMBXrsFRzx3so6DsStuOYrg+ybW8cjfQ/qpoxq3qPz5hWkHcSAmhbsoiurtN5DrIxLZTXyIjXaXiGM0Fn5tHOeAdNcRsKjAV4uw7yg3Mcde29kuGppSs/F7MnUV3n8nxObXmwipBs4K4QJKTk8u9fH1q2zdCYhknGZPZw8OwdQbXoeaPnLrGw7zAQJYcJMOKHONV4xem3FL2h88/MZ71cy8VwpfeZMIlrS4VEoUrx4DcozMkFRBOfOGVtrLVpkiwSwE4/ZLDh7Vv+z5Xk5vZHpxBKJRCLRRRNfuolXMV7ZiqximTJLee2lmCPeW/VFlUQ+DUqT7sy+NTcQ9R7NVLjLRVdWAktldCbV0WYn0KToBrAAXkriAlhQVyC2iB9Q0PBtAMwTqid0qRS61ZVTTVyp1CC10EUX4QVOg1V0fWJVjZ84+CdZ+06GMSEwE7DOJlRUEvGaTl7Bf6G9Mqv9SiTTnahHqmM2rzrW82nHUFSY77qFOeMWyosr6UDBK4pTZEmk76/T9elnPJPXn4Uyt6CXXAHIIFYikUhmAL6GRxFWKz/ntphAdvShRNIve3oG+OMfB3A6MwkuFeppMgwOPX0F3LHzBvYVPYQr4v9Zau+PE3VqaBjGak1db6qlBRvXa8VjYYhG6hH5NkLOUvUIFSXj7cdKLyUoXi+Q7WBqrJlGgmR170wmIFK/rqVv69UP5+Oj0fIYvoZHs7K/ilU1ztamB6CUrmg6svJRn6HqsFQjllxJaOJ4n/S0YSLEKlrjPJG9PiubaxQeXtKVlf1NnvsYxdeVU+KYzVM155ImowQm3Ws/Hbm5IjpBCfpqw6XmMyn3oUSV2seCzNqcScggViKRSGYAw1VL6d++i5CzlNeUvyboLOPj5hfo6Rmgp2eAcz2DcfWD+sFlMpqCsB4LFgiGq5ZyW3sTJ3uKOdfj49fvE1frXFUVZPv2UUGvUns/G/JfjAa9LjqjCreN1CcNpCzmIAVcJNY2poV7WWE5zsAzO/CeepcLPRfx33MfNj4mvUBTJqQP3vLzM91vpundevtTIMYyBwRFRemCWMFneCfNe6ATV1L9sEKYNfyQv1llYbhqacar6LkM88jqD6J/p7fpid+vNTfAw83z8J56l+GqpSktnSSSK4kT9b9hXWBP1CJLr68IMIsX2j4dZ3+zdessw0DWVreVwpr7OeK9lYWcpsvArkagJHm8G09uCgqUAQpGLlBbk8fnroO6urwkW56tW2fxtdCJFOJ6AgGRTIuxoAb0kpmBDGIlEolkhjBctVQN6s59HA0IjEgMLk2K/qDDaffxyD3/Sb4yGPe81Ro/I5+KWEGvX78Pf9dRFQ16T/bYuaP5JkLOUlYoR9lX9BCl9v6ogvXO3QHONP89IXsJYUyj4ko7m+OO78EfLuI9/gt6Kajp1YozD3TLTV04HAUMDqYLTAV2zme8X1CiQlJ6r2mPvj5TipBYcBOvUm9+xtAPUsNMOClFXGDiJ3yNvNd+BmQy0SGYxRB7qv856g8N8MjqD8hF79wQ2G3+uEkMbeU+5+1/ZfH8j5nnsHHNH36Zsu0SyZXCd/oezMjOKrF23e9X4gSWoiuijgI+3dLAA+yKiuQZTaKV2nqTPN71rnlrboAa07OEhYKXqxCY6PYW0tJiSVrh9fsV/tF8FxvYq9PvarX/+h7ZmaHww7aKMWyXTKLH+acdQyx0EE3ljn1MRFq3JHuksJNkWiAL7iXTFXluqrjdOWzdkhOnVhvr9xrr9bpggRrATid16fkOa4wXcSKqoFEvJSQP6LIRVcr8vWV00MFCyugwXAlJJoyipFfzBCKiVPGrqBvYy142UUwPXq5Ksy3oz3MLDrGKW3r2Aep50dCQh9ebqLosqKwMcfy4vkfribqTPPTif6U3XASA3TZE4/fRPWdO1J1kS8sXEgbzRt+1oKfHWFRGMvnIPnPimOewjVmVXBNY0mpeYwPKxP4hEc3jeUnlWS4efznutcS+vsm3xVDAT7ddEXG2wyynIVKOYiKs2z+bCEWstsAU4/mdGv0+QO+8zHMfw9b4OKYzHsTcIpRBH4eHvxEtk1G10TP7/i3mIDt3B6bVfe9yQgo7SSQSiWRSqKoKsn1HMN7HN6a2dTrbI+W5j6WtlbyAg4NFm7HbtXosbeI0GxGSzIPdr/IKAE08nLGlhZlwxim8AjVQVghTRgcvsYq9bAJUYar02xr7Rd7NSzgcao0bwPvv+5I8npubhwwDWIC7ti3mP/5oiaazv386aHjOPHHwT3RWo/S/azvelMcmkVxOOO1jt7vR+go9ATbjwExQRke0hCO37Q1KHLPj/L8T+/pVH+3JSjRPoHAVPQCcZiEBZ7lhcCpQEJgIW/J4MXdtRn1lpqnIee5jFG7dhNnTjSIEpj4vh4e/wToO0BVJ385mAiEQyklrLySZWGQQK5FIJJKMmOpANarSOW9O3KAqkUQxkRP1v0k7sGm1rOGupj/jdGMLLqWb7BWBs8kWUniRalpZzgqOsLvycLT2rJjzhvsKYc64VpmIvuhLrKKDhXFWQ+kotft40tZkOGAUkdq82Nq7yTw3jIWg4r8HC0Nsq35nwj5XIplq6htzsOYGst4utpwjGwG2Yi7E9RdafoXZ003h1k26fW54gTNL0TxVBf0eXuSwshJfw6OGYk/afkVhIXfu+Cv2m2uik3O2qBZCLIJ7KtszaoWt8XEUf/xEm57SfjZk811Lxo8MYiUSiUQy7UmcNTcaVMXZRUTERDb0Pc2XeR3jQFPh3tAByuurmVtTTZcozaBFserO2atpagrACrDm3x+K1p79R88sQ/EUp1NEa5WLi7U2GKHQRXnULieWYi4YbmW1Cr5710lWhg+xn7Vpj8vvV/jWlslVBDYSgipWvHHCM3ur/yWu9lYiudxJzICx28OYDGr4zWYxmiWzfSg6kZRp9kY6FL8fW+PjSc/7Gh7VtVJLRxALW8QPKNy6iabQt5O211TpAZS+PoarlnLH7i/xn85KQkoOHzv/lLUVv4hqBZgJcl/le3EChakwnUkWJMzOhi2ZifquJZkhg1iJRCKRXFL0bBfSETtr3spyyjmN2T/AZzfeGre9XurcIDbe4CY+QRdGQdlI2EJfnym6ypgKMwEOsRKBiUOsJB8/erfTUasI/c/UBkyK10uJYzYlV8/lVcd6/J7epG1iV1aqqoJcuEA0hRcEZpN+IK0Fy7Es5ZjOewW23BG2bx+i+rU1KH4/KzlCGZ0pvwsA34hlUhVBH1n9ge4A9+l73kkSnpFIrjRisxzef9/HnmYdcSWrYPfuId1MiIaG4SThPSO8FMf9rfW1CiFyCJDj6Yzrs211WyncuB4CI1gZJNvJvF5Kon3NftbGlUBoKc2grvbG1q+GFzjV4PnNz3O2x09PzwBne/wZB7DaPhMZjw2bxRzMWMxQMjGMO4itqKjYU1FR8X5FRcVvKyoqfllRUbE45rV5FRUVr1ZUVPyvyOv/+3g/TyKRSCSXLyfqTvJgrYhbKX2wVnCi7mTK7bRZ81aWR1U1BSa6Qs44OwmjdK4QOXxESYxH7tgJkcNqXqKWXYbpZ2aCbKCZUluv4X60AZOWsnc4/H+xjgMxPpCgBcH5w33kvP2vcdtrg9uengGGP1EeSSBOpgvX6JpxkZ0f2e4lOVBXyCtQ04JjVyj0bI2SUXi+7TN87jomRaHzrm2L2VH9dpLdhwxaJTORROX4xJXXV5YcZbHDyzyHjcUOL3lHDrHjnn/DpRhP4mnEBnG17OJuDkXtfVThpdEygleWHMXa8hyHQ0t1+q3sWckROlhIGHNcSrNQFIZvvjXjTJzPXUfk2Hv5aUW9YdmJ5p0ei15/p05GqqUeNvpJ9FjXVNWlqNOlZ9zqxBUVFbcDP2tvbw9E/r+jvb39k5HXXgA+bG9vf7KiouKLQAtwbXt7e6YfWo5UJ54RSDVDyXRFnpsTR577GJ+tvZkukTxb7lK6+N3e15Jsf9zuHJoaglYUWaIAACAASURBVHi8Nlx0MYAtMliKx2wKs3vPMI2NeXg8xvOzds5TiC9DJc104k7aIEbv88Lk4zesr9LUP2NrVcs5nbJd+fiiwVvieVkybw4LxYe62zudYU6dGh2YORwFBsekqnraF12P2dMNqJMGG9jHAIUG28QTq1gtmZnIPnPqeGXJUTa2rYjrdxTC3F/xBt//5mk+u/FWukL6PsuagvkeZTN/Kv4/A0uyUVx00kl52n4rNaNj+2IusIMtcX2iUBT899xH3ms/i/ZJsYRtBShDfgiFOKysZJ3yHIPhWdHX8/Gx3/IAd+y8gdkb7stKnbgbF06bl/rv22R/dgkYizrxhFrsVFRUFAN/AKzt7e3hioqKAaC8vb39QuT1fweq29vbf5XhLsuRQeyMQN70JNMVeW5OHPZF12PxdOgqPqoDqGb2s54QZkyEURQIicT0XmMbG2tugGUrBa2tFkZGjAPPGtOzvBi+e1wCHunaYyZoYOkjKKOTRuqTxJZMhNKqYbrMHk6enZN0XtoXXc9RzxdZx4G448pXBnlmrxI3CEsXxGr1x4f9X0/aX2YIXHTyZNEz3NX0Zyn9iCVXHrLPnDoWO7y6ll0KYfY2q6muiXY7sVhzAywMfsB74U+TbtJKiVjeZNJvZUouw7xANSs4QthZiq/hUYarllIybw6KTrwS2/saBdNldPCfzkrM3V3yvJzGTAeLnY3AP0YC2GJA0QLYCF1AJooZEolEIrnCMJ3xYEc/vTaPYZqpjaashTETEmb00l6N8I9YeO21HGy21IJHr8xdyb6ih3BF6q/sNj8Wy9gnSvXqNUMGt1cFYagWnEk9lpFSr6/hUVZY/yGursyldLHjnn9LWkUoKtI/Vu354aql9G/fRb1p2xgDfVVUakPf0/x485uG6XwSiWRi6TYYYgtMNDbmRVORS+1aWmw8/hFLRgEsQCnqyuh46kgTGSGPepq40HMR76l3oxNgevWrxLSyleV0Gvhtd+HSFXGSXAEIIVI+rr322lPXXnvtBYOHOeZ9y6699tr2a6+9dl7k7+Jrr73Wl7Cvn1x77bXfSPeZMY9yIZFIJp9Dh4QoKxNCUdR/Dx2a6hZNGp+Z2y0gHH18Zm73VDdp5lBWJorpESB0HmGD57N7r6Koj1TbK0p8sw4dEqK4ONs2qA8zAXGIFaKM00IhJMo4LQ6xXJRxWvf9ZZw23Nkhlot8BlJ+Xpk5xfk6eiDq/nLvEWWmLrVd5m5xqObN6Nsslvj9WizJl71CKOvvQ/d4y8om8iySSC4Jh256PtJfqfeKYs5Hr6HpilG/M9pvhoVCSNSY9o3r+s7PV78fAaKGXSn6zuz7VIVQ8jjk0CH1Q8fQb8o+6LKjXOjHhEmPtAoM7e3ti9K9p6Ki4m+ARuCm9vb2c5HteisqKqioqCiJWY11AclJ7WmQ6cRXPjL9aOrIcx/jGzWlvM5p9YlOuGnVq/zj/hu5ePzlqW3cBPPled28J+Jnmd/7aAGfmdPNP/1+ru428tycOPIeegRvTXH6N6ahmAt8RJFuuu6CBar9RKq6WGdRP6HS6znquYHNyk68ws6ovFI2CNaxjxUcZpnzTUyeblCUaNpbUmpvjGWEunX8SoIqEmVNeCX+876z+gPOn09OJwbI6x+icHCQB9jFPmoQI6Op2J0hJ+uai/AP/YK7ti1m584cGhvzOHNGYcECVfn4lluCnD8/uj87Q7r1x9nQiQvR1cUFeQ3NGK6EPvOVJUfZ0LaaEfKiz/VSwqrmv+SXzbvZ5fxeNNV1POgp7mayT7c7+fqtqgryZO7jrB553iC9V+0LBArN4XXY6MfH7DG0WrB/cBXL2t9kuPLL/GPb7Rj3nQrG/Zk+LrqgsxOxdi39/UOAqk5vGhwEsxlCoZijSe3tqvW5w2ULyYPL/ry8kolJJ858m/F+aETMaTtwa3t7e0fCy38PbIi874uAFfj1eD9TIpFMHN+oKeN1bmZ0EK/wOrfwtbaHKF5QcsWkAs5ecmdSAKui8B8X9VOVJBPLcNVSnHYjpdvMBjkKYXaYtvKi+d6kNF5rboCGhmEaGoax5gZ0t8/Hx1MfPxCpH92PVxRn/NkqAiKehDXsYS+bCDtL8Z56lws9F+nfe4CQs5QVylH2FT1Eqe2CrmWEMJvxV99PyFkap7hMCosfu20opSKvrfFxHvB/j2YeQJCcij2IjScO/gkQb9uhiT4l2h4JxWiIkM2kssJhsfyK6UckM4PH2m6JC2BHMdFMLUc9X9RVx80UTUV3bk01n/S0cVgsM1Tc1ds20QtbU2i/c8cX2UBzRFE3FQp+bFl7uwLY6GcNL5Lj6cTW9ppuDW7iZ2VKLsOj3rB+PwX13x5VJQYOh5ZSrnRiJkQ5p2lleQpvVxHtc3Pb3oDa2ozbIbk8mAh14vPACBAzf8tNkZXYq4FDQBngBza0t7f/cxa7L0cKO80IroSZ28uVVCIvZXTShQun3Ud9Y85lrdC33PHPvM4tpBK00UOemxOL251DTc0sxmbFILjJ0sb/s/P3AJyo/w3f6XuQLlyYCRPCjMt8hkdWf0DwC39O07d8dPvskddMlNEVFVQaq6KmJmai0cpy6k3b6A47WeAcXRHRiFX6jSVcZKc3Mu/7aUf6FU+rVcTZaOidlyXz5mARIwaCUqPtP9ejDlzz3Mc4Uf8bvtn3XbyUEPubWHMD+EdyMLpeMFRlTiaHAD8038cdu78kRZ5mAFdCnznPYUspVlRGBx0sJBSZwMoGLQiNFVeKVSsXECdqlMiiRTbdTBNNhVy7rtW+sYxU6uqHWEkDTZF6UqNrffR5M4GoboHRezJFSRCESlQnbmU5W9hBLyUJnzW6TT4+rAzq9p/abzTaeDPnz/Zl3U7JpWHK1YkngXJkEDsjuBJuepcrqYLY2OcVguyvfpPgF/5cN41puqKlXXk8qdJFZRB7KSkvtzE4OLZEIJtN8Ifvv0BB/bdR+rwcjqxiJlpKrL36BPs+Xoni9+vuZ6yKmqW2C3TkVaB49T87Mdg0VNVUFJ7d66OhIQ+v1+jcFCgKuteZ3nlpX3Q9OZ5Og32paOrGee5j/Hjzm6wJHCCERfe9ZlOYUDj5Oyqjw9DqyIgCLvKR80+zHvBLLj+uhD7TSOVXQ5vMEorChXMfZ7VvoyA0MegSViv923clBbLz5hUghH5/4XSKuHvzxhqL4aSWmSDByLXfalnDqkALhv0QAhdddFMaN4k3ViwMMZt+eilGQUT7Yi2QBbiXFoPV8HiKOc9FCgkwK+ZZoWvZc77n4rjbLpkcpoM6sUQiuWKIv5kJctjQ8pds3WjSTWOabrjdOVx3nY2amlmRAYNxAPvp2VK58FJiMggsM8Hngx8/8AamPi8K+vVQAhMH/ngXh/1fN9zP2BQ1BV9ZOhuRbzP8bL9fobFxdOBlpKrZmn8/D9YKvF7jc9NMKJrum8lEka/hUcyEDF/Px8cjqz8A1NTj9YGdhgEsQCiskK8MJu2jkXq8ZFfbPEChVAiVXDY8VvkqqdLmtf7D6PpOxZkz+td7Ylqs4vdja3w86X0LFhi3K/He/MWKs+gfhyCEOZqSG7zmUynbHMZMBwsJjyNsUAijEKaY8ygokUkwU0zpg/rcvbSwhR0ZBbAAvRQTSHqvuq91HKCV5epT5vEH35LphQxiJZIZzo1Xv0fyTU7/JhkiF38o/maROGifDrjdOWzdkpMyQIjFSNRJMrG43TksLvUxIPLHsReFhvAT0b+M6qEEJhpoMtxLI/U69WDpMn4UXnstJxqMGX22x6PgcBQw32Fl/TX/E2G1xn9Kbi4NvgYG03wPRjY9RgxXLeWeynb0rucCZYAd1W9Ha2pNZzz4KEy5Pxed7Bejlj2xdb1jmQQYy4BfIpkKbj++jPWzj6DXJ1gYopF6hNWKr+HRrPab5z5GqemM7msmwqMBl/aczsRPQ8MwVqteXxV/r/P7Ff7Tt4D7Kt8DwoyWAWhZVgqdlHMvLaxt/1bS9hqxE2OpJsnSIYCXlNUU4EsZoI6Ql5BCnA7jLKtBbDTQpP6Ks2ZRMm8O9kXXyxr9KwQZxEokM5wfveOKCWS1R3YYzSxPFd/aYsY/YrzCNIqIHLtksnG7c9i60UTX8NWMrR52lNjgMVUwZRjgWiwst/+M/azDZeqOBmg17EkrdHLmjBINxow/Wx1Uhcjh+bbPsPbzvyLkLEUoivqvrcDQzzEWl8FgNxVPHXdRXR3AbI4IUJkF1dUBPjwn4kSh0gWU+fh4sugZljnfjKzAmOP8bbOdBFAIZz3gv5Koq8tj/vwCdXJjfgF1ddNr4k+SzBO/v4MXKp+jmPNo90Y752nhXpY534qm+ua5j2FfdH3aACnPfYzCrZtoCn1bt58JkcPdHKKWXbSynHJOYxZBFjt6eWXJ0ej7NK9Xkyn9vdrjUXjquIueHh89PQO4SC43GCEvIRU3FlWBXfukdezD+DoXjAbLepioF0+mEGKKJdN7RPp63Ojn+XwoQqgCWjX3U3xdOXnuY7jdOUmidloml8OhXrMVFbZpmXE205E1sZJpwZVQQ3MlkOc+RmHtWkwiRDaBhiYoMV5eWXKUx9puoZtSSunmscpXuf34sqz2UVeXR0uLhdTtV/uTG69+jx+9k/qGKs/NicGoDmwsxNaOtbKcuzmkW9/qooPO2BozQBTZGWj6XlydWdTmwtPNYWVlZKClL3TidIZ5t6GFwtq1HBbLkmpi9VAUgbNoAI/XRildNFEfEVMpN9wml2H2VP+zoRrxeM/LPPcximruMahvUwVf7mi+CYDCrZt0a4tbWU49TXTjopQuerEbWHYI7qt8j6eOZzJ4vfJYssRKW1uiWrQ6ubBt2/BUNWvSmEl9phaYHvZ/nQaa6MJFKd002hpZOfhcnG1OrMhbK8tZw0GDetVk4TQLQzxb+VLc/dBYz2IUs1lw9uyo3kM6warEdtSwhz3Wb+NfthLr0VYUv5+v8FNDkcRizmPDR5dB36YQxmnz0u3LZqXVuH2ZjFMS7wMateziWWoiKdKJQlUk7dtiEezcOTStNUAuZ2RNrEQiGTeHlZVp3hE/oaTZmoyXV5YcZWPbCrooQ2CiizI2tq2Im4FOR577GAdbUqcQW62C5uYhenoG0gawkokjk9V6EyPkku5cEjSaHon+tZIjbGBvkqVEPj4eq3wtbgW0v/k5ets7koRShquWRi1ybjnXzMmeYpqbh5JS9qxWVSxluGop/nvuY4VylP2MptsarUAIAd3ewsh5Xc46DvBVXjG0wTAR4gWqU9rpjJfhqqXcV/FPOm1WB65LKs8yXLWU4aql9G/fFf0ew3Y74SI7QlFY5nyL3zW/zrkeH79rfp3mogYsDCXtr7IyNGMDWLc7RyeABVA4eDCTbBHJdMbW+Dg3+/8Hq2ilk/LovWu97wejtjlbaslzH1N9pGMIGQokKSQOzwPM4lttxjX+RoQSsn+NLc6SKaOTPeZv0r99F75t2+nfvotwkZ3XuM1wm15K8GLHqC902rw85dsyJmufWFL1t7Hk46Mpxptbo5ZdNPNAZBIv+drUG0MEAtOvdGqmI4NYiUQSxdb4eKTeMFXAEatYHGZNTmvczGRsalVxRTnF15UbplnFvvextpuTVrQGsfFY281pa1hiPfeMBwZqKtiBobtZ/fbmFMcnmQyMxUhG0/QOmu/nBaopo4NUA5Q79nxZDaQi79ptf5TnK5/HZfagEMZl9rCj+m1uP75MDU7PfYz31LtZ2btoKXtOZ1hdSXWG41SHfdu207/3AMucb3FauYYQ5hT1Ysl+rT/hdsPg+yB3s8z5VsZtHSuNb36e+yrfw0yQUe/bvXy/+rdcPP5y9H3RIP/cx/S+30Fve0fSdzpctZTb2pvY2SzivrPm5iGOHx+7kNflgF46ooY66NXvTxMDDMnlxyZPne6qpFaLCaCMjFDwrW+Cor5H84XOtqzCS0nc+VVUlD6Iczrj31PfmENmJUNhvso/MvJXN2BrfJySeXOwNT7OQNP36G9+LuWWA8xG79gUBE+NfIuVHGY/a2PStGPJLOtSAGWG5RxqWrN2H9DrS/ezQbeN6ZhupVMzHZlOLJkWzKT0o+lMybw5mEUwK+sRFx2c7FGVSrXUKiNbk1jLgDz3MQo3rkeJjOSMLE+iVgY6dgOaH979fd/PQMkwTBldkXSvLv6u4hBffXNL2uOT5+bEoNXExgqDad6ISyrPcvH4y6NpvWc8lHAer0hWwC0qCtPePv7U9Ylm9pI7+du2Kpp5gEw8FBXCvMQqNrMj4tE6ai+xwnKc/p3NKYNueV5OD/Q8P2NtloztUNRUT+/uF6LnfGzq6eXMTDo35zusKXyZwxxiVTTN2BXxqU5XSpAO7fwC2Fxj0lHmjX9fYvprRYWNvr5M7vFhatjLXjZFnxFWK/5lK3H98Gl6dfrn1Kje85pXN6gB/Wgadhc3VX+C1lYLIyOpg8UyOmiknmpeSKjnDbO24g0a3/x89Bm9cYk6eZh9QDpRpVOSZGQ6sUQiGRfhBc6sVUe7cUVXWl/e/BYL/e9hIhSV7o8l1jLg5S2/ZGHo95gIUUIPisEMrIkwJkIs9L/HifrfRJ/Xbkx/2/fdDAJYQQ6hmHSvcmrb/5YTdSd1353nPsZPK+pZ7OjFpIT53HVIUYdxUlUVZPvuMKW2C+pqKR3sV9bzN9W26Kpf7Ipf4958LJb4c8JiETQ1Tc8awovHX+YHlW5q2BO3slmA/mDeTi/rlOfwchVa+pqfiFqxImf7LxcaG/PiAlhQVWFramaxaJGNuXONMxDWmp+ncHMNZk+3ruCMZPpjnPmjEptm3Ek59/AinRkJGxmjOQJUVQXZW/3PuJSuqHWNal8TptTerxvAAjQ1GakbJ2KimVpaWT4qNOUf4NMtDSwVRzAlZZ6kV3fvjJRT1LKLck5zN4cAeIlVnLZ/jm3bhrHZUu9HIUxjJEVYSQhEcwmw6Jt/EfecVhJBcXG0hWNRWbZYxISUTkkmDrkSK5kWzKSZ2+lMnvsYP978JusCe9KK1WhoIjtailT8dsmG4wJ4tnmQB2vCGXxG/CpWPj6eaTZRVRWMimQYz6iqfYaZEFYGIylO8bjMHk6enRP3nNF3YM0NsH1HUIo6XELc7hwaG/M4c0ZhwQJ1AHG5fP/a+dnKcu7hRYIxfqw5BJjNR5EANh7tego5S/Geetdw/7LPnB6kWmkFyDWHECYTgUD86vxNvMrPU9QWtlrW8HDhLjx9BZfduT+Tzs358wsIhbKbdFIIZ5XtpLsPRXDunCrYFJvBkulqvtudw1Mbe+kOfQKRwqIGVLEmP/lx90MzgcgKdPqsE30S36vex51OgceTuj0gCBWVcE3fr3VXtI1WS6+6qpCL+56n4MEtbBx8WidrRq+NKkVF6gTq5XINXo7IlViJRDIuhquWcsfOG9hX9BCuiFhNsdKrI9aiko8vOiPaQJNOUKpjOG4y0diYl2GQnFxn9NTGXlUkQ8c/L5Gw1UZAycVHge7r3aFPJD1na3ychsBjSe3zj1hoagiqNb6O2ZQ4ZlNcIVdMJpOqqiCnTvk4d26AU6d803IAYWSvoYm4/JK/JJiQbhgkJ5pCnIhmB5HJ+Z0tqWo3JZmR+Hs7iwZSvn8kZGa2aQCnMwyEo6v0v6ciKVNFo5XlrAvsUcXAhILHY2Lr1lny95pmuN05zJqVvS2dQEm6p1oYooCLaBoBNi4a1IyqxGoMxGawZFr7X1UV5He7f0bIWkAxF1K+t5eSpPthCD0HAOPJ5GQS36sKMno8prSJKE6nAMXYQi1V3epw1VJ6O87y/erfchOvpWhfOCoA2dMzQHv79Lz/zHRkECuRSOLQBFoebp7HAid4sVNQlIvNFm+WXsx59rM2usKayvstVuSCcJgznrGnS3aHPkFh7VpEvpp6aXQDtnMB/H5Evmprokep+Q9Jz5nOeAyPxeO1YfJ6o/PWpj4vhZtrZCB7GaIFdJpvp8ORfWCnpbTHpYNu3aSeD2Y1zXAftWQ22FPR0vnT+bhmi1a76fGYooFRTc2sOP/DE3UnWTz/Y+Y5bCye/7Fhuv1MJM99jJ8tfJDP1txEjqcDixghx9NBnzd9ENM7bGOwbxjNO5hIamnc5F4M69mXPInml8qo0wntevL5Uqvh66OgoERTf11mDy3cSz9zEJgQmBhgDufN8zlQ/YahSvp40dJsJyfPUVBGBzaDcoqUWwoFRdFvlXbsitdrWPpkLCI4im/bdo70/AXV1YGkz1IdDOSq6+WADGIlEkkSbncOmzePDnj7+kz4fFropj76mc169qEQjqRHpb6Rd1JGCT1cRc+4bpp2elkoPiTHd5EyOljKj3RWisN4KWYhp9noe4oBbCQONPPx8cjqD5L2n6ouWO/5w4ElfLbmK1kN/PPcx/huuRuHY9RMXXssXFggV1wmCW0V7VXHBh6sFRHfWiWSDpj9ipet8fEkEbNo3XcoRC27Ul4XiXZCuQzTSD3CasXX8Gi2h6eLFqzX1MxKqt0E9dreunUWDy/pYkvLF+gKOdW68ZCTLS1fkIEsoyUG630/iPhfmqLBqJESayJenzXpfYPYWEUrSoyGQC278FGouw+pjDrxRLMTHAUsnv8xrzo2pFXDB2hqCOpcT5kzQh4F+AgpOXSEXdHJ4DjCYe7atjilSvp4Ga5aSh/ZCjRlRgDF8FxOhxBEshcEZrM6UaQd+woOA9BIfZJVT74ymFWAv23bMHv3Tt73K5lcZE2sZFowk2poxsJYal7GQ+bqhZeWXIYRiDg1wnx8rKGFn3A7nbhQIKHeKLH2RlBs6uPpNb/V9eE0qonVlHRjBxt6dcD5+Nhb8YM45WO3O4eGb4HXF6uiCEaDX7NZsHu3vJFmSibXR6xCZTmnU6qDZqpAWTJvDkrkHtrKcrawg15NadjUhzc8B5HC8kl9JJ6rYLcN0fh9Uv7+mfSZesq5xujXs7n+f/bePTqq87z3/+wZSaPRSNga4SGU0YWktUi82nOOQ5Pza2PiVZckTZw6qVzKzWDZQUZgIMVpZEvxIdQZ2eRXc37YBnFxLHMRUOo5q67TNDZJamO3OW0J67Q5TqBtjJCGOAg0AovRffb7+2PP3prL3jN7RiMhofezlpbNXPa8M7Pn3e/zPs/z/Tq6Of2rW208/+bFe+cdfCR0KkdFWft9giVEGKI45l2ZSqW3n5+czWEIU8xMuZ6b/T70eX6l+68T1PDj+/P95dfpDnuYeB5I4CTKupIjtA3Up9ybqS8+Xyyef42uqHXlRwWX6aWC7N9vNj2yiVR6+zlfcofpnF5RW4OjLwykqhs/WX/e0l97ppyXsxXZEyuRzDBcwRO8Pn8ji329WibP18sbvkeoqJlPxaIa3vCtZ/GHrnJrYz0fCZ0aN0/XSxYnib6+ie7457LppHm7VXDZJLMqUFAZoShJTn/cc7OThVTTZSKYkZp9cv/arZYXOrO+4EpvP3vLH0/ZLTfrAx7Aw9Zzj7B4Xh/zfB4++qFRHm0sjMvExP+ZE43K0kG7mJX0vrb5bT6+iITez/isabrSd7Cf8dJLfjtYQT0v0RunNNyrem2It5idqwrhiJstWybeA2mmnGuN+eO61QUTGsPNQLoWA3O0uaza3jrMYAAPquU5I3i6f3Ln/dmG2e9Db32JV9JPLsXvDpflGJolo5WX7x1YywZeSLgnn9UYmXhyzX9Y6l4UMsQy/pLcgtHcPiV30ShP91u0aQBKLIAFWMUxOlmIipNOFlpe1yU3JzITK5kWzMYdMlfwBK81/ogG9qdk8tbSTjsPM0QxyQp+um+b6vXSe7ZzUsbm85WS6wVIQ6AgslZgrHKGuBCtTNhd9dLLNW5NUHc1e71qLsSsCzK/ZryyYyb0c9MVPEHZlg0oIyOAFrispgNrMYuJLXOyGeNsIhgsoKXFRTic+Pl66KeYYXqpSMnGu92CA4OrWRUrQ8tXJlYPoBcO/mxCvo+5jMPOnDnPV5qxzD8T8T7Qs5VsM7G6wjTk7keZjO6XPVXZuYkwU67nVsrS+mfdwUoe9x+2VMvNh8qwjpMx4xonFIXBBx8msmNnXo6diVebTvNI+xLTCgAPHyBw2nYryB1tje/3C1ojW3ig74WUR+jn/lzfHMur7pWeDyxfYaacl7MVmYmVSGYQnsB2WgiYZvLa2MgQqT1Umm/bRpyM8Wh4O56mrZpSpm8Oc+eXM9c3x1Y/Tya83oltGjmJ5nRx744uQLjdrOIYAZqpoote5mYIYEH3n7O7VDQTftD7VOf73JrYz4c8NDWNZ0OH65bRv2sPqtdrlBFbL04nvmi1I04x22hqctHYWEw4rIupjP9FmBPLhDpSzr3BQYVm5w7j32a9VDrZiKbowijZZenAbqXCRHsgK50XJ/R8EARKWid4jJlPpGUbgcJvWp4ziQhDsd1KfThXOlgxKarVsxWrOdZLL3PpYTVHjL55MwTEsu0Tn6vj/WYVIXCdfH3Cx7TDE/d30dD+acsS9ghlGQJYreog9TPI9jPR7HXOnImw+upu00fo577weo3bdP9a3Zte9vDPLmQQK5HcINKXqKVbvCqoOGljI3/Y/scxr1RQolEU0EpvGr9i2/7FzCIkEBimyDGa5lnWipxFDBPNcWoRQFn0Ko1FB2jgQCzzYX8hrwUv6S+eZkGKp2krjzcOsndgreF9F1UdtLcXoiiqIdg0XLeM3rOdPO4/Mqk7006nNFVPJhgs4OWXzWwd7KFvkIBWgrafdVQpXZgJh2TTizxct4w8CwkbTHQjozX69VgmMDcq6OUPn/3khMZwMxDfYpBpfnGgGm0HmiJ7NrYj1ggcNHCAI7duzPq5EnNaWoZTlH+LGOYDyuJaA6yppiuWcZ94EOskmvDvqdiseLXpNC+d+mjO2eQihjmiPICKk3t4g3j3go/OCaV8tpnQN+2slNn1268Hvo1gvI3jAjWaGB01rGu/O2HzJtticgAAIABJREFUWXJzI4NYieQGkU4F1x4KP2Sp6W7/UVbw4b6fcEtjfSyzqPXbfr+2OSGwtbIIWfPPm3nJuY5q3SuWy3hjdgDVdMZsbVIv8A6ivEQ9HqzKYDNd1BQiI0XsHXl4AkFievPy4UGVxkYXi329/I+aIIsrI5S277MwPlfARKk1vyqhIuHP45ndok6epq3MnV/OG75HEnrFv9Y4Zlr6Z5cFfkH/zueJ+isRisJy/zv8dM9Jenqu8/77mhdgrl60LS3DFBbaX7A5UHEXpdskgqKiiW1kuIInWKkcn8DyWvBM/U8nVUBuOpOsWvta449Y6XmVTHOYGjeHpN+kzP6bGcBDiyIz4/mirm5sXPkXQZUzRCkfpOgumKNl3LXr78Q1JBrYm3BLvi22dPT5da5vDk+1L0wbwJYoA5ZVWU7GeLH8a6y49e9QgB/wOcMeSODgp3N+l+eWvxnzRbZHpeMiruAJIi3bjA1HnfgeYX1O2sIuk+9Kob29kCfun8jaSjJTkD2xkmnBbOxVsOqJzbbPJr7/CswVc3VKiLC/cCNffO4uhuuW4b3zDpyh7pTHCacTJRpNvd3hAFXFQRTzPTBBI7vZR6NleVI++kXzR3Zj0ZVa77zTEyszs0cRwxQ6VSLRxAtuBVfYpfwpX37QPWX9T9MZT9NW3O0vctT0HM79vHG7xaTbJgSDBTQ3u5JE0cyzcPX1o3ziE1FaW8YIhT146WWIYiKUAlo5fyCQ3qcw05yp/7Yz9f9aI3hpyYvc+8ryHJ47c3EFT/Bq8/9hfd8zpurkzbTGbHbMiZ+P59ITy+jlj5nQKz+Tr+fzfB6b119BKf1cp5Tc8kHamtbpEDSwnz1q4/g9bneCMnK+8DRt5Wvt/4X9rI8rX7ZQyGeMvfVvM/aJ/56i4Ox2C5YvH+XkyQIuhjTruQDNCcKHQlEQt5ZzrO+zabQjxklWhQbSKs7P9c3BkbbfXOUID/Dl+hLj2jqTz8vZQC49sTKIlUwLZuvk4gqe4G8efYuW6J/TTRVVdPHrnOOHfAb7C3Y1wcYj06K1mk5+4V+iCSTEWYTEYxUu6LcXMBoruzVDP57V+CcSxN7YAFhB5WpbO02POdk7sNZkLCZ2PkqYZx78N6maaIO588tRotEJBF7jaJtBUOW4yBO7505pZnvuvFu4TVwyDWBK6ee9nom/RqY5U/9tp9vUykQVFzjd4838wBmOYZ8SUqhUuokIt+l3V00nAZrTbxLG2XBlDmKzn8/sio7dSGby9XyxrzftJkU+0b/LqbLQ2+Y7RBsbyHTOKajsr3/LuGbF2wstWCBYunSM48cLEwLbAkYZw2kcW0GwPiZCafU7cBBFoKQEwXbEyypqa3D2mVeEjSOo4gLfXHKSe19ZPqPPy9mAFHaSSGYYw3XL+Oz7uzndU8H1+kcI0MyP+RTJgVA6FBIFRDKJzHRRZfTbWJYsOa2yqBrpe14zLcrsLtoS33cJEdYXfSfl9qmkki7Ktm7irwa+gPn7UHAyhoJKFRc4UP8mP79UJANYu8Sy/9kLJWk4iBol74dZjVrk5qe735jy0mx1gZ9dbEkRAiohwv+snxrhEf23rff/jgvQ2P/9dFM5KWObTiTYp6DQJaoMr99kuqga76d2hgA1Vi6p9VMn+0iHyaTqnF0A6y4alb3yk4greIIv8LcmfeQiVn2U32uP3pYyXLeM8Jl3uXLpGuEz7zJct4xgsIDaWg8+Xyk+XymLFnlyttwKBgtYtMhjO4B9aMnPE65ZdXVjnDkT4dIlreXi5MmCFGsiTXxxXHBPxEQoN/C86VzodgsOscawxon/3djpB77e+u1YW1P6d9NFDY+eWjlhuzLJ9EQGsRLJNCGyYydPePeZ7PArKIr1xVPgoJlW4/Kaqc+2ii5jgWvVezK4pt70dl0VsDpjL282PV9Wj1Pi/k9lTeExduxSqOJCDsfKhvge1XFKiNBKC0cHv2S5yAVQcRB1l/LTth/I4DVbYpsnufWKCw7xgLEoWuk8Qf+uPTekpzPSso2V7r82gkdtU6OTF5YcnbJzIv63bXgpFro47HnE8D/2Oq9ShHVQVElqq8HNhrmXrvlCXz8vl/vf4fT7t9DTE+H9nkGjn3pFeaKirJfePI1SUOntZ+eusVnbKz/ZvNp0moWN99HGhoRyYgWVRnYTpQAlz0GslXBbMFjA5s3F9PWNB4XhsCMn7+hgsIAtW+IV3c3QfNirnCH217/F069k8NEO2fee1rQmSJgLK7397Nw5xHL/26bPstMPPFy3jF1swc41fwCP9F2/SZFBrEQyjQj1lZreLgRUevuxmrC7lWr6215EFBWltQ8pIUKg8JsJAgnxYjdRfyX9O58nsmOn6e3XA99GuN1pX2McPZA1DwqT3mHaIwkc/NXofZRtbqSV5rSP93qzkftPDVSPsIpokZsjrDYuutV0st+xnpUcTaM4qlHpuDgp/UyzgcE19QisLHAyL1b03XyhKPS/sO+GfQf672q5/x3OKx9m1F/DT9t+OKX9paa/7efa+Oz5ZzndU8Glnghn33eyd8khKriM2W/hm0vemLLx3iisRdpSP48AzQkCM8lcb/02ojCTHVhu/OQsMoCdJILBAra8/EnCzCV5bhc4OMGfAFCp2FEMFjgYMb09nnRWXoGAi9HR1PNyZETJOhgLBFyMjKQPOp2KyqWeCKffvyXjJpsreIJKJZvNLYUHOMJqjgBwyLPeOJczCThlYrn/HTzYKw/OrxijZLogg1iJZBphtTPr9wt+chYqveaCHgsWCMPHdLn/Ha3cjQuMl7tpmaC95Y8bok46ZqVMVrfHL8730xBTIU4XXGi7yCUMWJb+VDkv0siejFYgvczl6Oj9rIwrOzKjNNxFI3sSAtBGdluoJCrG5+NkjAHcPOF5jv27+vhi2+/xC/8SVKWAX/iX8MXdd6P6K9OWuhYxzBO758oANkd0AY74Elj9Oxy3cEiPUBQGH3z4hn8HVr+r6TaGe19Zznttf8PhooeNDK2eNb7ZRZ3++K6rWMmCVHBlPIvu6GY/DSz3v5N2g2q4bhn9z7VpGwfYKSe2x0R9uyXpCQRcDIgSy/t7mcvh8kd58sFfZLSNKWKQF5ccxKv0om/eej2D1NePairISmYrr3QBV/x9ZvZ4KY/PmDUVrHkwVcTRCk9gO63icZueyforOAAHF6ihYXS3kU222kS3O1dGWraxr3AzhQxlfKz0Xb85kcJOkmmBbLjX0PuzkpUA9QtepvtvBPff7+bUqXFBBysquMwgJQnl0vrYAwGXLbVfJ2McZA2b2UU4jWBKssAK6CISZq+hUsJg4riKRo3Svfhz0xU8wW9uWEqXSA1kHUTZV39KlhBPECvF7ExiT1V08p5/yaSJokw35Jw5Mf74rqu8dc6P2bxVwgD7Wcdy/9tpz6d4QZ4jt26kRWkl1FfKggValq21ZYzucJnFCJJFnVSKGWaI4qTbNTXrHTtmTi/sTDs3580rzWjfpYswBYMFbGh0WSoYV3CZn/ekt+hJFkpqaUlUIk+nfh8vBlW2dRPK4KBxn1AUEALVX2lkM62uV7FnZH1uxQvGtdDKBaq5keJkupr4N/oeo5sqvIT5gNIE6x19nbF+vXtGnZezDSnsJJHMcBJ860x2bDPdPxm82nSaxfOvaX6d868ZXqk6r7wySFvbEJmyZGEq2K+M9+PpfTF1dWO2S32iFNDAAepqf5p293UADw/xIjWcx0GUas5b9qc5ECl9yIMjhbS2pH6mw3XLePLBX1CiDCTcXqIMyAA2T5iVmAFcSCv2JHiy/vwNy3hKpj/xWavv1zZbBrAgWFNygs/07CXSsg1PYLtppiveY/uoWM76vmfoDpchhEIo5GDr1mJ+/z6XSeZOUMwgRQySOGc6cKByD28kVaUoHD9eKIVpJhE7WTr9GlVXN8bBcutezHTZ91ebTrNo3giNjTEhsbhzJf773bb0LdPrW5EzapQgl37tqwkBLIAiBAoYfu+lzV+nVTxu2vdeWChoaxvKenMkXjCuk4VUp9WoMCdkpyrbJsN1y/jcuVajReLnPS6eaxNTukaS3DhkECuRTDOSlQCTJ99M9+eTV5tOs6X9E3RF/QgcdEX9bGn/REogW1c3lrHkrZIuVokOLrAQFScXBuexkqNAdqU+A3j4/uXf5jvlf0aVobiayghuLlCjjZsaPqCMAkZTHqdaTIOhsLklyX07FvPsHiXhIvnsHkUGsHkiocQMLbugqW+nt1KQn7/ECrOA0/p8Utg7sJaFlS5e2/gmzlA3ihBaYLC50QhkPYHtRhDRQmvqRtigwsmTBezcOUR5ud6nr3mLRlEYoSRlDAN4eJN7UrJ8g4PZ90JK7KMFhhmuX+KCsZHh6Atbtr/4veYZRv1aGhYVJH/vg4MKTz/aa5xb9SfX0s5Dcb3qggou8+Kcr1JXN4anaStKJLG1qIMVxqZtDec5OvgllL4wqzjGS9QnHMvLZZ57LrfALnmTMUCzrXLeeJyoeH89NwV6O0zlGklyY5HlxJJpwUwrP5otLJ5/ja5oqlJglTPE6fdvSbhNV0E0E5EoIUIN/8nP+C3jtnt4g9f96wifede0TDo9gmouEKA5VtJUY+tZCtEET910VNHJ6Z4K3nijjCc299MdLsVJlCgOKr0RmgMF8uI4BbiCJ/jNxnvSejfq39VsQs6Z9okvUc/Gg7iIYV6iPqEtQSixOSqW9QLrVgVFEex/8E22vPzJtD2XiZh7xyqK4NIlc02E6cZMOzeDwQIefbSYaNRa5OsIq1jFMToK19IwuocBUr/PEgZ4tk0xvS5YXUvjX6NK6ebJB3/Bwy/fY+7frihcuXTN8NPWMfOCNmup0bHjw5qO+DJ6UV7O0aE6tgwEYsJYmhd2BE+aa62KipP+thentHJmpp2Xs41cyollECuZFsjJZXri83kwL9hQ6elJ3XFuanJx6FAh49dXQRVdlHItFsAm9np9jH/jx22nDV+8QMDFxZBCOVe4xhyiFJEpA1fEICO4Mzxu/PFWJYTJlj6PlBzivzz7Jzy2pZCBkdRSvvi+WcnkMs/nsexBA8FhzyN89vyzUzqmG42cM+2j9/FBut54c6rppJOFaR9jFRhXevtR+vrS9CSm4mSMKKnzTT77CCebmXZuputBBa3P9Qo+wPq7djLG3vq3LStC0s9h45QoA+y9tYkH+l5IuU84naCqCRso6cZU5eim01XLxsFvs5/1RHHiJMrDH3qNb/3b72ccy0RI1zus/6YmGkxny0w7L2cbsidWIpHkFadFyZR+uyt4gopFNbzhe4SP+oZoby+M7WZrf4WM0EqzSQALoPAzfovXGn+EK3jCKAEa9VfzJ/yljQBWO4ZZWZ71+zFXYfTQn1AeJnBwcHgFLS0u0wAWtL7Z+BIwyeRR6fyl5X0FjPKHf/HJKRyNZKYR7zuZrQfxBapRUFGIsoHnE+7TSzgvUJVSXuouGuX28I/pEpW2X6uwULCmXk3po01nxyKZOOk0GUqIxPxINazU6VWcpgGsrikhbF6jBkQJLUprqvUMoESjRt9rPFZj6lb9rL+lgzY2xjZGFKIUsP9XX6Llrn+xNZ5cqasb46ElP0/5XehWVQCOi3lsjpXMSmQQK5HMMoLBAj6+SNsZXuzr5fu1zZaBWNSiHCiKU+sz27KBY+HP0sABermN5GBylGK2sCvNaBRaCOAJbDducVwMsZ/1KceaKCVEaGBvijVACRGKGU7tQ4u6CIfTj6E7+muUbd1kO5ANBguorfXg85Xi85WyaJFHCrbY4Mk1/2HRd6XSVv8PUsxJkpZIyzbDvzVAs2lvvDX6ppyDNjYagaxewqllwBwIHCixnsNKbz+/M/IWP2Qp2cxjZWWCHTuGp1y8b7ZjpcngZCylJNdqE2SBX+AKnuD7tc0s9vUyz+dh4fwC1rV/OlZGbP88CPWVGroAHaykgh4cqCiozKUnphEwTrox7f/VfSavrXDg3N22x5MrT79SxcHiBkPMsZrOhM8zfnNJIskFGcRKJDcxyT5yrzadZuuWAk1FMyZ4tL7vGV5r/BFzfXNSFDj9fmvfWk9gO8rIiKmoSTy9sT4ZK7qoStiRVRf4LYPn3BGspZ09bErwH61yhtjPOnotFSXTLzyq6OLo4Jf4zcbfx+crZb7Pjc9Xyp13pganwWABmzcX09fnQF8Yh8MOtmwploFsBu7bsZg99T9O8V5saxuWgk4SeygKG3ieBzjMmEm5rs2D0MZGw14ked4TKPj9gvMld/Amd2M9f5jPq3194wq4Uphm6ti29C3Tzc2DrEnpKQ3QjNuZmBV3uwXblr7Fa5vfZn3fM3TFBAUjUTe5LLN13/cDLedYoxyK2clp14xebmM1HUYwK2JjSh7/ePbeWsBsKvhc1//kvTn/jShOOllofJ7C7TZsgCSSXJFBrERyk9Jy17/gbVxLQegChWKETaGv89TLH2FwpDDhcQN4aCGQIM2vB7ItLcOWpW164GlVyjSOfrE0X7hV0YW4tdwItpWBiGXZb+4onOBPAM0a4Lz7Y1xta+f0+7egllfkdDkvIcLn+S4P0U5XzCtPL9kKhRw0NhZTWzsezAYCLkZHU19pZEQqj9rhvh2LOXupiJ6e6/T0XOfsedmPLLGHJ7CdjSPP0sbGmNjMRBbwSiwDaz7vXbyo4LgYSrMRJ6iwsPvKRqVdkjvBYAF33ulh3jxtw9H11/8rYXMzOWMYzyqOsdZxGKdT20xzOgXLl49Sf3ItLaPfTLuha4759dUVPEHLhgFUYXYeacFsAwfo8KzjMz37eLbNMW2z9+H/7KK/7UVNcV5RiPor6d/5vKygkUwYKewkmRbIhvv88sT9XXzn1MdIFlLSMFG+jKkF6sQLLlgZs+uKn9mofSZTQoS1tPO3fJFuKqmiiwDNtFPPD/l907FaizNlQmh2Ok4ng2vqiezYqalSbigiarpQMD+GpoysjXM9e7nOnLTP0I3WN2woRgjzcc8k5VHJ9EDOmfaZO+8WCsWIqWBSrjiIJsyZOn6/Sic1uELvmb6e06Gyd+1bKYrF+jwxXQKPiTAdz01dUfd46C4alAMJn72Vkq/ZlcZMCdjtFhwYXMVqjmAnN+RkDBUHlXTxBb7L3zq/RHd0AX7vdQKiGUdfL820Gpuj6cikzO7zlVocQ9DTM7uuOdPxvJSMI4WdJJJZhi6sNNc3h7m+OVTU1uAKnuClU4sw64OxIrmnJr6816q0LdKyDVFURIBmUzP19GjB4AAltLGRLqoROLhADfW8xNsssRxvBVdwMJLh2OYoaOIY7vYXeb3mMbY+6sgigNWEoY6w2lArvU5ZxucMDio0NhaTbr9QZmAkkvxg1vMvystttCiIpL/0qDhS5j23c5jWyBYcoW4a2GdyHMGatWOmXtM3SwA7HXm16TS/uWEphaFO1nIwxe5oAA+r6aCG83Swgg08TwGjOFApYDRB0MvKE7iZVkshxHgUVA6yhlF/DT9t+yHbe9by0xdeZ6x8LhfCc3D09dLAgZilWObN2u4MlVBLlkQxOw+12yWSmY1sxJJIZhCu4AlKH9uCMhDhKCtoppVuLhsZzFV9x3ht45sIHrI8RgmRFD85XS1Qx47ggl4KJL72T4hItkGY9cV5lGLL+0qIsIy/5DusSxPGWge/8Y/4xkAzg2RXxhulgAYOANpixn5GOL6kOvE5RUVSeVQiyQfBYAFbtxQwOKIpu+o9/w7H1ZjHc7olz/jvspAhHKgM4065L/7xZXxAKRG6qNLm4Kg2BwPs4VEA9vOIZm3i0ALYHTu033pdnSyHzzdz7v9Dik69afxbFLl4adXfJWS9o5a5G4UL1LCGw1rFTuw7j1JAGxs5zBoilFqqDHdTlXHrQ0HloSU/5zOv7COMdr62LhojFK6ninsM3/NsSpIrlRBQbnn/K68Mcv/9bk6dGt/EWbIkyiuvDNp+DYlkuiLLiSXTAlnmkUowWEBzs8sQ+4gPgBRE7GI6fkHVS6JaaE1T3quZtrfQOr7wojmhjEq43Vn1q2Ty2Msf42PPtny5kCG+wot8j3uN9631tOU27mo6J/D88bnM6xUEAsNyMSvJGjlnpmI1F1XTyR84Xmev2oDdjScPHzCXcOx3njjX6iS3YZgx1V6Y04EbcW7qAWzyt1TN+VhWc3KpjlU/ml+bhDHXr+Ro2rLmAeyLQZUQYZ/nT2edR3auyDlzeiPLiSWSm4RgsIDGxqIEJVsY/38Rt1Osowk0tWYUWlruf4dOFqImqwVCToIL6Tz28kk1F1jFMRtCUvEIqulkCW+xlw1ciKlGXqBmQtIuXVSlKR3LvB8PCiUMsuO+t2QAK5HkCau5qIsq9ohGHl7yM+yUCgNEKDPsc6wCXwcqDqJGGarpY6QX5pSgB7C6d6/+vWh9pZON4DoePs93U1SC9fOtpAQK/vl/U7ZlA85QNy0ETMua011XKrhMBZcTxacGXsz/25FIZggyiJVIpiFfe1RADjYzepbRCr9faL2syUbqbjf9bS8SPvNu1oqB+e/nFCm9ZvElz+neXzJVjhABmvkRS1N8YLV/W409fV+cQEnTY2fT1J4SnmpfaGptJJFIssdqLqqiC3WBn6dfqaKtbShFcd2cTL9jQZQCY1OsgQOmgaz0wpw6OlhBPS8lbFZaY6/32R6aWvBB6llLeywrq/m66puWoZCDDe3/D7eNhHAQ5YJFcB3FYWL3M8ARVnMFH1fwJWxAy/NLMpuRQaxEMg3R/OWyp4ouAo4nKXSmZvf03svhumWGkXo+5O5bWoYpLLS3GFBsCF+AYIQi9EVGBZcTlCPNPPGsXqs11mOUHMBmfm4mBWTz8sJs6aYqwdro1abTCdYP0j9WIrFPS8sw7qLRhNtKiBAo/KbhSVlXN8bOnUOGqJLXq2I/mBEoqDgZw6oSJuHR0gszbyR7nptt+m1hl4mmgrVPqsPW9cg+A3j4HvfSyUKq6Uq57oxSTC+3mVZS6VTTxX7WUeXoRkET/Xq2TeGLbb9nuvkszy/JbEYGsRLJTUIJEb5V0soXd9/Ncy+MUl6uL860hdquXePql8N1ywifeZcrl67llH2Np65ujLKyzIvAIoZZzx5jl9p8J1wQXzat7XDP5R/4HePelcpxXlhyFCWNl6yCyu8536RFfcpyxxvA60nNyiioWQe9uRKfVT46+CW2vPxJQiEHQmg791u3FstAViKxSV3dGDt3jVHp7UdBpYpO9pY/zhefuythjotXXD97NoLXay+IreAKKs6Y8E8qequDoXFcnNtmpCQRV/AEZVs34Qx1owiR4mcOMLLkbnqZm9VxVRQKGcrrWPVzILu2F40SIjxV/ixfbLuH07+6lUs9444A+d58lkhuBqSwk2RaIBvuE/mQrySjYMg4Aq9niMBfcMP6K+fNK7XwQNV+txVcYRdbUnz4OlhBs/PbdKsLcCiCqGq+OFRQOcxqVirHuXLpWpzaYvJrCkrpR6D1tKXLliqKYM8ebQHzdOMluvMg+JQN+nvSPxMrv12/X+XMmcyZZ8nsQs6Z+WNc1bgw7eMKGaKdhyzF5aropJOFie7chYX0P9c2q4KNfJ+buid5Bytin72mSRDFSaX3Ol8YCvK9gbtt+arGU00nAZpZy6G8eQhXx84B+4JSAgXBAj+GB7tkcpBz5vRGCjtJJDcJD9e+Rfp+TRVi4g5HWEVPee0NvfhZ9aL5/Zqh+s97ivlyvQehJC4wVirH6YxWMrqgGjVNZZfAwRZ2oS7w09TksghgtTLgYYqIMMf0/vgjPvjgqGFz8Z5/idFnVG3Zc5u/jTQFlfXsSQjqrXbup0o4SyKZregZXK9nkHS/81GKaaGVz/Ndk8cJPl/4g1R37tFRSpu/nucRzy4cF0N0sIIGDhhiW1rQqdAdLmPvwFrbvqrjCEOZ3yqzni1FDHMdDw6iRBxzbLXZVNHF1bb2BA92iURiDxnESiTTkMDbv8262r/HgW5Urv05iNLIbgRORJy4w1QrYCb3J21b+lZKWa7bneh/Gtmxk/49B7RyqNg7UoQwekIrMwg29TKX9qUHOXSoEKvFisCR1mdWp7xcGH6NQILYlVnPbQkRGh37cDtz9XMVCaqSh1nNbjYlPELz+0sl/8JZEsnNiZ2+SSvq6sY4ez5zENFFFX/LF0mdgxS+N/r7ps9R+sK2xyFJRV3gz+Cfmv1GXwVXjE3ESucvJzA67WqmoBLFYfS89qpeFAW8nkEUVCq4nCpYqAzwZP17sypLL5HkExnESiTTlMDbv82vegbo6blOT891IvUNjFHAnqTgB6ZWAdOsP+nh41/gueVvJoilFBcLNmwoThAoGq5bpglRKErKsqOV5gz9SQrbT36aqHUrrG2uXk189fh+o5XKcfaWP06l54oRdO4tf5xndhey8wWVSm8/IGLiLuaLk2SquZCgKrmi/HX6215M6G968sFfZNwIkEgkiQSDBZoYmq+UDzfex7zQT3CKMT4SOsVrjT/C07Q1q+P5/ek3jcrppZtK0/u6qLa02pHkTqRlWw49ptrGoZPRlHuKGGYXW7RHud0s/d1r5F5pM257FyWxHH1kRKGk3MXVtnYu+T/OSzxElTM0Lti0R+G+HYtzfF2JRCJ7YiXTAtmrYA89gFQGB43bhNs9ZQIPwWABLRsGCAsvkNjrGvVXEj7zLq82nWbLy59M8MBzuwU7d2rCUnp/kxkdrOAR9lr2syqKwOGAaHRiJbbZ9JlanZvx7yO+V0tBt+/RKCGSoK6c7vsKBgsIBFxcvKiwYIGQPVISS2b7nOkKnuDV5v/D+r5nLDN02m+vgS+2/Z7t+TFTf2wFlyklYmnfkvx7By086m97cdZk3Cbj3Pyob4hebrP9eAfRuDLhcbV5/Zq1UjmOKC8HAQv7fmKzfzV7FEVw6dL1STm2JDtm+5w53ZE9sRLJTc6NVCgMBgvYvLmYsKhgXDn4Nh7vhTbKAAAgAElEQVSinQ5W4LgYwhU8wVMvfyTFxH1wUCEQcAGkLX1e4X2dD8oX4uWK6f0LFgjWrBklu13zJPVhRRAKKRO2sIkvQV7FMTpZiOouZX/9W4Y6qmFIzzGtk9nrTft9xaumyh4pyWwim1JgfTPvyb7H0pSY6rY3ATyB7bbHUVc3xj7PVqzmmDAVBGi2rL4ws9pRIKsxzFaMrHqSxZgreCLDjJ/an6wJI+oq9w4KGeYIq7iCj+X+d+jfcwDl+nWO9X02Jgg1Och2EIlk8pCZWMm0QO6QTT/is4L+8utE+kZiAWwq1XRy3vnrEI3iJGpqUaPvSFtlYoWi0L/nAMN1y7RsyNZiBgfHM67x2dymJheHDhYQVfX7rTOzHj5gLmEtS6ooCSrK8ce0It256QqewBPYjuNiCHWBn0jLNiNATXefRDJRbqY5M9sKE30O0Wy2Mu3Fq6hKAVcuXbM9nrnzbmGheM802+qllzL6Y9ZdVvOOoJHdCa0fQlGyGsNMJpdz0ywDXsIAu+r/ifqTaykMdaaxPlOpposuqnCgWioNV9PJeRbS3/YipS1f51j4szRwIO1GyESwc32RTB0305x5MyIzsRKJJC/oQaTuWdodLjNKiM3oogolGkUh0fs0nvJybSMqPoOpIxSFwQcfNhasdXVj7Nw5ZPTY+v1qwmLAdayDqKFmnGBokXDcEiLsYz2dLKTK+csUG6D4DHEupPPbzacXr0RyM+MJbOfo4Jeo4TwOotRwnqODX6Ls0UdMM7N6NYeTNJLmMZyoWWsGqAv8pgJvIAjjjQW36VoaFNrYyAaeTzimxJrWlrGUEu4BSniq/cM4Qt2W1xUNBQEcZnVapeEuqhBeL8N1y1DCYbawawIBbGpixeHQ9CDMrlkSiST/yCBWIpGkEAi4ErKgGtaLtvgFRoBmUzGNcFihqcllXhK95wCRHTsTHm9VWvs/ql5h79CDaNNX6hidDjW1lNftpju6wHTs0sJGIrmxHA/dZdinCBxcoIYGDnA0uswQjyt79BEjkBXl5QBEbSxhojg1MbksiLRsYyXH2M86KrjMeMCil6faQWE/67Xxut1Zj2G20R02Dya7qASnM20JNyh0xc4ZL72Wr1FJF0P3/RGu4AmOsoJe5uY0ViejNLI77twQeLnCvrVvcvasbAeRSKYKGcRKJJIUsgnsihgmQHPCbeblXAovv1xIMFgwoSzlgaEHSLeQVIXC1bZ2fuFfwkrluNE3vMBCdTTfPUvxfV21tR4WLUrt8ZJIJOM0O3ekZMTie0s7WMHC6H9ya2M9i329HAt/DiCNp/M4Xs9Q1lUQR1lJjaOLBzjCVcrJxcIFtAB6KnULZjJWWXUnKkSjrOQYL1GfEDgmo59DZsFuIUO00ozr5Ot4Attj51Zu3+utXGUPm7iCD4FDs9ThNupPrs3peBKJJDdkECuRSFLQS3/To1kYvER9ghJnusWBEBMr3wVtYZiOBQuEaZDc0jJsamGzdOmYqZhILiSXYff1OQiHtf8PhRxs3VosA1mJJAmrKokuqtjA8zzAESNLq2Xc9rOB57mOh4wib67s5hv9N9ylVsZsU3L/vTodQrYS2MRqXo/iQPVXIrxeVnHMCByVNMJbqcGuyiguVtOBL/QTjofuysGyZ5xe5ppaKU21X7tEMtuRQaxEIkkgGCzgWthOr1mUK/iMALaDFdRwPiZ4Ys1Ey3edWBvFKoq1r6pZn+3y5aMcP15oBJ0TDTTNy7DHybUH10q1UyK5GbCqkvDSy142pAj6DOBhLxtilivp55O+vuzmm0y/YfsIHvzUz/NwnNlBpdfchqaKbiIt27ge+DaiqCjudvMsvCOW0b2CjyOsivl5660nCmFu4wEOUkI625tMm7gKq+lAQWUuPUZAK/ueJZKpRQaxEonEwBU8wZZGYSP7IGhgr/GvDlYYPW2ZFpUTLd9tYB/miwzBgw+Opu1DSu6zPXmyIGXBOjio0PK13MZmJ0DPNohPzu7aDbSzsSwxo6nJxfz5pfh8pcyfX0pT08Qy6BNBBvE3N2ZVErqokpUirbVSbSLZzje5bbKlWrzcwxvse+8PcjjW7KQ5UIC7KFFLoYQBnqx/j6/+8wN4H30Ix8gQBYyygecthLe0VpYGDtDBCjaziyipfr+CAiKUTnDESTZzhWtl37NEMsXIIFYikQBa0PNHjZWM4M74WCcjCfYRLbTaUnl0u60zpXZ5pq2YRvbgIIpeLlbEAF7PEC+/XJhVkGO1YA1Hinm16XTWY/OXZza1z3ZRbZYZMsvoBoMF1NZ68Pm0wPPDjfdxPPSpcWGcrZvSBrLxgWJNjYf29kKiUW2hFo0qtLcXss13kIpFNQnHmewAM9cgXjJzSK6S8PABQxTnILyTGkwuXZqduI7V71Mx5pvE41dwmUZ2U02nISh3hFX8gM/huBji1abTLJ5/jXk+D4vnX8tpXpkN1NWNsXPXWEKlzLNtCv/I7ybORRTQxkb+gd9hP+timdZE9H7qcNrzJ3+CfiO4eKJM9j1LJFONDGIlklnOE/d3Md/n5pbGh/ghS8l8cRccpB6wU0IscDpUIH+WA8N1y3imrZgR/0JUxcmh8s04iwoJR9xZBznWAaXCUwd/PatxuYIneLp/k2l2YBzBwAC82nTaNEvqCp7g+7XNLPb1Ms/n4eOLIBQy/z7iA/BgsIDNm4vp6xsvm+tlrpYhiJW6KYODeALbTY+VHCgODJgrP+9nPY5wmLItG3AFT2QMMHPJBgeDBXx8Edqi39dLy4YB0yD+6Ud7c84ySyaPdEFbug0PvUriwQdHiVCGipN0PqzmpJ6zJ09mt9lhlRU+zAMcYRVVzpARZL205EUu42MPm+hkISpOOlk43mJR8hW2tH+Crqhf6+eN+tnS/gkZyFpgpkh/6FAhZt/rPhpZyTFLS50LGXte01skZS4pTiTUN9HMrkQiyRZFiPwqc+aZGuB8b+91VHVaj1MyQaQJ9Y3hifu7+M6pj2F/V1rQyG52s4mOknU8MvD/MUCJ5aP9fpUzZ9IFdRPn44ugO1yWcrvXq3L2bPrXDgYLaGwsxuz9K6hc6onYPje9d96BM9RNBytooZUuqvDSyxDFsdK1RD/be3iDH/A5OlhBM0/TTSVeeunjFlSKEh5rNj79sw0GC3j00eJYpiKVajrpZKF2JEXhyqVrKY+5804PoZCdPU1hlHFG/ZXU0Gn6PK9XxUOEUNhDFV0EaDasjtIptQaDBTzWKJLOKfP3r6DGAh3SHveJ+7t4+VQtUZw4ifLgknM8/Uruoi7Thek4Z77adJot7Z9IqMooIcKu+n9m7BP/nc2bixkdTfwd1NePsmPHeHXG/PmllucyaN/773GSH/IZ7MxbiiK4dClzhUQ8wWABrS1jts5fT9NW3AdfAlVN/IW73dQMn6NLrUw5fpUzxOn3b8lqTDOJfJ6bPl/y3KmjVeE4UU3bX5yMTUiUa/w1oJR+hililGLLR07FtU4yMabjnCkZx+FQqKgoBVgIdNp6zmQOSCKRTG9ePlVLNgHsPbzBHjah+it5wrs3bQCbj9LhTLiCJwhZ+AuGw0rGbOxKjuLliul9lXTxun8zNcqFWFYwzHfvP57wmPjM0rzQT5hLDw9wBIDDrGYXW4jgwSyT8EM+g4sIq+mgi2rDpiExgCX23EShLf2z1TOh6Rb98SqcVsIjdvsA40W1jofusswSh8MK3eGyBM/PDlakzQYDPL0pbHJOmb+Gl15qOI+DKAsHf8bfPPZPKVnflrv+he+c+lhsMauVIn7n1Mfwz3Px/dpmmcnNM08d+g1Tq5ynDv0Gzc2upAAWQCtTb2pyGb+jqKVum6CaTg6zmh/wOdtjyqUHv65ujJ+cxdSqK3mjJLJjJ1d+dZX+thcTva93Pk+3aq663B39tZTbgsECFldG8Pk8fMhXYrQF+HylLFpYMGvL552WYvQKGOrRqZlzOx7CmdGqWlScfEVpp7xcTXktgKKiyb/WSSSSVGQmVjItkDtkNwbrXW5zqunkvPtj9O98nls3PIQQ5jvkfr92UZ9ss3fvnXfwkdCpmKBUKvHZykDARSik4HRCNKqpYT7dvwlGR2jgQEr2aC3tHKQ+5fZ9zkZWRQ9zlBU08KJlIF/EMGMoJkFpLgiquUAXVfi9EZoDBdTVjdnKoOqZWFFUREfBWloGmunO4Th6Fn4PmzQhL+VFBoT1JobVODpYyeP+w1y8qLBgQeJ5Ms/nsSnYI3AwmvDZlhBhP+uMUk5RVEThSMQyG1PEsGEPlSlDPB2ZyjkzGCzgG18do3dYK5n0coW/WPLX3PvK8oTHWc0nCioiFhCYod2f/nt3MsZB1hhVDumOp+N2i7y0MOTK4vnX6IqmbhxVcYFWnqA59l7G34f1+yl0jvHcC+mF66YL+Tw3m5pctLeblRQnosQ2+vTMeQutlteFXKiik9M9FQSDBTQ3uwzVa69XEAhM/rVOMnHkOnN6k0smVgaxkmmBnFxuDPN97qxKrhRUrra1M1y3zDLwmcqyqrnzbuGoWM5qOjBdPCuCPXuG2Lq12NQ2Qw98AGNxnGkRpAdjWi9w6v2TQ2IZb/jMuwDMm1dqsZGgoQdqK7yvc+zq52lQ9yYE5e6iUXbu0hZfyZ+RwyFACFSh4CRKA3sNMa9qztOV5XtXUDnM6pQNg/hAY7GvN+vjJiKoiGXWw1RkDHScjKHioIounip/ls+da53Aa08tUzVnBoMFbNlYwIiarPJqty8VKrgcE2nKVUxHUMwQQ5iX/sfjdApUlZQNkhuBWXm1Xg79Yz5lSwwvnplSsprvc3Oh30VkJNNmoPbb38UWVirHOSpW8ACHbatYZ0JvL5HMXOQ6c3ojg1jJjEVOLjcG855Y/bdm3YcJ42JA8YHPVGc+9D7UufTEPCNTxwukzTLG94zGo6mRpj5P78V0EM3bAikTFVzmCj4gsa/VOoMqcKDyCG3sZhM4nSyM/qdp0J2crU7OkLqCJyht/jpKX9h4jjOH915NJwJMg1R9DG/4HuEBjkzZ5xqPlg2EKucveXLNf3DfjsVTPoZsmKo5036/tDWFDOFihOvMyfEI5n3RydzozKsZT9zfxUunPppwTtvJPJuRS3/vjSDf52YwWMBjGxUGVOueVB29B/u+HYttZ3E14teYqY+v9Pbzk7O2hyyZhsh15vRG9sRKJJKsePqVKh5e8rOYTYHAyRjrC7/Dgfo3UxQ6k3tck20x8qU+nA3DSz+LAHaxJUUVWB9vpn7PC1TjIEoN5w0l3w5WWC57quhK+O9kU8Qwu9hi/Du+r9VMSVVD6+M6SD1HWYESjSb0xsZz8aIyHsCGFCodF3km9AANzX4qFtVQtmEdwuOhv+1FVH8lCtm/9xIiBJRv0G01hpCC9847WMkx1rOHbJVB84EWVEgF2WRy801NZJTiWNYx1+813RgESh7Vz/NJMFjAy//wsZSANddNmol6bM9U6urGeHa3MLXTSWYAD9tPfhqAHTuG8XozfWaCKmeIA/Vvcq3tJQ6Vb069lhSN0hyYnT3JEsl0RmZiJdMCuUM2/bDKzE0n9EwskKAKXOn8JU+8UJFFv6fOeDmqWWZXL4ldxTGtLzSpNDb5WNqfeabU+r5x9D5Ao8/TpHczvt/XbLGfqfzZ61UZHFQSMurJ/aX6azM4iAI23nvce3AKXnhhiEc2eCgTV4mYZOO8XOE5Nsf1O8KN3mONV5BtanJx6FAh0agmNLNmTaKi7o1gJmViU7Gu9siWCi5zyf9xwmfeTVAVrqSLb5U/y32t//WG9DqbVapMhNnaExtPMFjA5kYHo7jSPi4+Yx0MFvDYBmHav1+iDPDsHiXlM50J1z5J9sh15vRGZmIlEkneMPPsm0rseIw6LobMnxwdo6zxK3jvvINtS9+yyFaaodDLbbH+vVT0o9Rwngc4gpsB0meXJraAjeJkuf+dBMXT5AW5/j1ZvZLulxig2TTDIAQpC+0BPLSQ2B+qDA4aUqGrOMZa2o0MvtVnUOSM8sILWnasw/0VIqRaIQFEKKGBA1ygxsiI3mi6o7+G9847eOL+LtrbC2MK0ArR6Lii7mygpWWYIsdono+aWZTJLh9QxvHQXVrQuKXAUMXuoob1fc/w2ua3b4j6dCDgyjGAFWhq5ML483oGZ0wAO5nU1Y2xp/4fSVZrTyY+Y11XN8Z+8RWq6QTUhKqjAVFMa8tYivLzjb72SSQSe9z4lYJEIpEk4QqeoGzrJpyhbhQhcIa6Kdu6KWUxqpfWbuB5HuCIEQR1UcMDHGZT6Os8fPwLPLf8zVh/rMDpHF8cWmO++KygNyHYMsvWJh7DahGr2F7CH2g5x749EWro5NYND3HnnR5Tu40FfmuhnQ08zyqOsV95hCo6UVCp9Pazc9cYV6+aj8S0/DgaRbjddLCCF/mKYV0zbgM0/tk6iLJqTdRYALYMPYnV5zGMO2uRm8mmii6coW4LGyqFQ4eShY5uTurqxti1e4wKVz/x3681me7PL6MU0+zcoQWNI4nfyQAeWka/mdbWabKwV4adGLA6GaOR3US9t3Gt7SV6eq7T03Ods+fHZCAV474dizlQ/xZFmFdCmFm7rXSe0BTacXKQNZQwEJu7HHSHy9j6qGPWWhhJJDMZWU4smRbIMo/ZiyYu9VHiAwUPEfbRkFDOConKvKAFu689+hYPRF827TPTy39XlPwNwuvFcTGEKC9H6e9n4ei/Z6kubK8EOBvsCLyUl6sMDSkZBbSCwQIaG83VW52MMeqeY5rJtSoXNRO8ivoribRsY+GGLxEWFSajVYn/fNxFozy36h3qT66lMNSZ5r1mFu4Zt2kh42MnSnw5tWbdYW4l1dNz40R2buSc+d37j/Nnp76UUrGgoOIhwnVKcaISxRH77+QGCAoCFEyVuhVUokqBIYY2Vdgpw66ik5+2/XBGWTvZYSrOTTPbNCtrt7m+OcYv2KqtQgo33fzIdeb0RpYTSySSGcW4OrKD8YyeQoRS1nLQEFrSSS4fHq5bRrMIWAZHAgcttKIMRIysriMcRhkd5fN8l+wyRgr5njIFxMrcrMfR16eklCUODioEAonlrOkyNVGclj6oZuJQJUQI0Jw4VrebSMs2huuWERZei1dK/HwGRwp56uWP4Ax1ZxCDMn//zpjTbjWdHGY11VxgsgNYEAn9wE6i5mNzTvIwpjH3vrKcn/cUG5nCa20vcaSwHjdDMQViB1EKKGGQBvamlLHnm3LPkKXoURVdCWJoU4W16JpGCRFaab4hWeKbAb3kt6fnOu+/r52HVqW/qr/S+H8rgbtQeHpVgkgkkszIIFYikdwwzEs1NaIUpvRlmi1Gu9UFaV+jiyrTV/ge91q+9lRRTRedLIwFZ9lhVq5oFVg5nVhmexJUpmNKnftpYEX566heb9p+XDt0C+0704Ji80W9AimBTgkRDrIGFSedLGQVxywXoKkIFKKU8gHj5Zr2qOZCQgVAA3tNni9YsybffaIzl+G6ZTxR9nxKSfgAHr7rXcuzbQ683uy+h6yIDGhBY1Hid6KgcoEqFg68O+XlovrvSi/fr+AyFVw2NmX0jRLLvn5J3oi0bNOE6bBWVq+cIrV5iUSSP2QQK5FIbhhR0qez4oMWPROYTKXzl2mPoS9aOlhBDecNO50LtgOiTOS2MFdQjWxngGbTHq/CQmFpEWGWedICq+wDLkPIpOc6p9+/hc/07KX3XCe9Zzu5cuka4TPvJgSwXs9Q+jcXh5deQwjLCoHCftZRHVvwxy/y47Fv7aMgcDJMER6yKfkVXKAqwW5pN5tYV/v3Ri+10ymor7/x6sTTjVBfqeXtdXVjnD0bob5+1GZPenb0Ua4FjbvGqPRqvbvjpfpa3+NjjSrfr22eUpGnurox3vMvQcXJFXxcwZewKQPmG3OS/DJct4z+nc+jer2mAnclRPhW+bM3aHQSiSRXZBArkUhuGFalmjqVzl9mzAQ+ueY/LMsV9bJY3RJGF2S6QI215JJiZ3GtCbJU00kju1OULzMt0BVF8NCSnxvKw8v977C7/h/jslWC8nKV554bIhBILUs0Ey8BzRcxPlCwCriCwQLuvNPDvHmllkJRVrzadBoikZT3WMAohSQGt0UM8wFlcarD5p+6kyirOEYnC1MW+fEEaMaJ/QzoKMUxSx+7IjtayfgFamjggOEX3BZ50ChZfP/96zKANcGqnNdffp1gsIDaWk+CyrP2l59AVs+i1dWN8ZOzWm9kcovBAB6e7HuMsq2b+MZv/YB5vhJ8vlJ8vlI+7GPSfIHjs4DJWG3MSfLPcN0yes928uV6D/tpSNwwK9zIfa3/9UYPUSKRZIkUdpJMC2TD/exkvCfWrDRWGPYsmfju/cf55qmldFFlCMpU00WAZlZxjGrO02Ui5pEsrOQuGmX5KsHJkwUx31VMxwapwkfxHq6vNp3mq+2/TSRWXqkAJR7BwEBuvoP59C008680E4qyem6q56I2N1dwhWX8Jd/jXrqoooouruPJoOCsPb+R3exhU4ZHacylh3DGY+YH/TsWijLlwkCZmG5zpm5xE68QXEKEtcohXi78SopysI7TKYhGIdfS/hIi7Cv5Uz7bOZ5Jmzev1FLkaT17aGNjyusVMEpb/Tvct2NxTuNIhyt4Ak9gO45QN7oKkRoTSbvZRJ1g+p2byRjfx8UQ6gL/Tfs9SBKZ7uflbCcXYScZxEqmBXJymb2YqhN7BH/xF9kFavELE3FrOSig9PWhLvCnUcZVqaDXUFktKREUF2tiSgsWCJYuHaO9vcj09RRUrRxaUWbUQshKNbXKGaJTrUKUl4OAY32fo9nxDN3qAirpolX5Bs3iW6abATrxqr6gyfuYf+7afO50CNYVH6JtoN70ePqsf5SVtBCgi6qYQvHU9DIrqKg4U1SxpwPTcc58feFjfCPSbGxiBGimhda0KuCKIkCkZk7N0c4IXam6mi4Cjif54u67E357Vue4k7FYC4P5+VPlDHH6/VtsjEOSjul4bkok8ryc3sggVjJjkZOLZLJoanLR3l6I2cK1gssMuOemqP/quN2CkhKF3t7U+yq4zCX/x6ddcJOJdFkqNdajrJdfxwv1lBBhADeZulDiM9Rz6bHMxFbTyfnY49KFpBt4gb002gxy8os+xpEld/PBK38z5a+fjuk4Z8ZbmehYb2Ro+P0qjvd/SVc0fW9oCRH2F27ky6sLcZ18PW0WzazawK6N06WeyVVSng1Mx3NTIpHn5fRGWuxIJBJJHOkC2BIiCLAMYIm7r8iZ2rv7AWW0Lz2Yr6FOGemsSHRaaDVVmnWiZjy+LsbVwQr6mZP2cfE51WThrQ5W0MGKSQpgzfqWzW2GFKDo1Jt4mrbmeQyzg3RiXHpvt3lfu6CIgTihrwa+vLqQyI6dhM+8ayo4pqMrA1d6+1GMXvXM2ftMInESiUQimT7IIFYikdy0HDpkHsDqXqB9VGQ8RjgMnjmpxxilmO0nPz3xQU4x5v6Vgit4Myo3a6WY6ati9KClhVZGcGV8HGAqvLWaDlbTMUkZWAVHivWOgoIKJurICuA+1D4J47i5EOWp/sGaGuxA8iPxelWjD/u+HYvZVf/PVDlDKKhUcYFGdjOfnoTnuE6+bnssusjT9fpHUG2cQwWM8uSa/7B9fDNcwRN477yDufNuwXvnHVOqhCyRSCSzDRnESiSSm5ZoGvHjZloppy/jMaqq4OpV8yyOmVfrdOf0sfcYHEzORCpEmJNRuZmE3GlqNlPPXgIZPF0F1/EYNjZmmV9y6n2133aimvRGChw4Y9ZHKerI6U4mCQDXW7+NKEwUcFpZ+Aq76v9J8yFWBH6/SlvbEGfPRhJ63u/bsZjT79/CpZ4IrTRzkPqETY0GDnA8dFfWY3KdfD1NNnjc7ueuD/37hESdXMETlG3dhDPUjSIEzlA3ZVs3yUBWIpFIJgkZxEokkpsWh2VQo9BFDWG8pAt83G5BIGBdgmt1+3TB07SVN+Y18lHfkGEnoqlBW9vdgBbMKRlLh5ODTIESZ5nkxaSROO65vdxm2NikD3jtUchQLLs6MaIUGONKwJne0/hmpqnJxfz52vkzf34pTU3mGfbhumX0P9dG1F85bo31XBv37Vis+RBfus67Le2sC9SmZCvjbZ/WctC0nL3ZucP4d/zja2s9LFqUahkVDBbw4dApLlBlcj7rPbLa39//6mO03PUvOX9GnsB2Ng5+mwJGUVApYJSNg9/GE9ie8zElEolEYo0UdpJMC2TDvSTfvNp0mnXtn8beXp2WjfEqfVBSQt9AsWFls369m717B3O2pZlK4q14/CW93Bs5zndYl7as1xqVarqyVgTWFYq3sMuGvQ4xj13SKthaI1AQeOmlnzkW7zOzoI/VuHSBKgEM1n+FyI6dOYxxcpiqOdO8r3z8euz327d8cgVP8Nrmt2kZ/ea4gnHhNxlcvZbNx+9O258OoCC41HPdQrhpHLdbsHz5KMcPCQajrrjnq4a+samtF2OE2w7aUhlvanJx6FAh0Sg4FIEiRolSRPLn1MhutveszXi8mwl5PZdMR+R5Ob2R6sSSGYucXCT5JBgsYHOjg9EsgjfDEzTO7xXGz818erVOBuYLe5XcC25UjrCaVRyL9cnW2H5mBZdjtkWZg0cFlcOsTlFDtoOTMRrYy37WE6XA5BF6uWj2n4FhocT0C2Bh6ubM+fNLiUYzfY/a9bnSe53mQIHl7+L7tc2s73smRfW6mCHCNvrT/X6VM2cilhY68TgdKlE19THVdHKBaqx65aPlcxEeT1r143SCcSnjYIz3ewYzPu5mQl7PJdMReV5Ob6Q6sUQimfX83W/9ORsbi7IKYGG8h/Po4Jf4zUc/a5Qmdmx4B++dd/DIBg+d1HB1z0ucOROZVgEsQCDgMslMTWSKd9BCKwCf57tk029qN4AFTeBpFcfYz7qYiqx9ohTQxkaLABZAwanktgG6wA9Xej7gSs8H01eugRwAABnCSURBVC6AnUrstQJrmc3ucBmNjcUsWjRe0hvPN/oeMy0T1sr606MrGYO9XvSoav4YLYC1pqjvEptCX0/b12otGGcyDmZvGbpEIpFMJjKIlUgkNw3f//XHWferbxl+p9lQRZehktsV9SOEQijkoKHtv3E89CljUfvao2+x+ENXmefzsHj+NV5tOj0J7yR7JkNkSg/sv8e9ZFeSa/VYayGoVRzjIGsoYjjLUaYfV1Q4cBeNZnXE+IBptpN9K7BCOOxg69bilEC2O8veZydjhhjU8uWjBAIu5s0rxWFj5eLEKvrWS4nNNjcUY2NkA89rtwwOpvS1ZqPxZT0OiUQikUyEvAWxtbW1d9fW1kZra2sfjbttXm1t7Ru1tbX/Xltb+6+1tbWfzNfrSSQSSTyu4Am+8cGf2ej/TF28FjBKgGZLf1Q9I9nBChqibXSplQgcdEX9bGn/xLQIZCdDZEpXdc1OeMlqHIL6+lHDRiXZxga0QLaMD9IcN/v36KGffZ6tKGkyskXOKOXl4+q5063X+UayZs0ouXzug4MKgUDib9HvTfaC1fDQb/Iaggb2cunSdVpahjl+vJBQyIEQSqy82XpMJUS4mx9mECfTj2EezO5nvfEvx8VQwr32A3vBQ7Wn7D5YIpFIJFmQlyC2tra2DNgB/F3SXU8Dp86dO3c7sBE4UltbO/M8KSQSybTHE9huK9PjYjCmoqstYEv5gJdZyyqOWQZr+u1WQe5Th35josO3Rbwia7wKK1j5v+ZOIUNGltTaoiQRJ2NUcMX0Pr9fsGPHMJ1qFSpOOlmYamMDaXojc7HcgUE8rL66G2v5B8GuF0Y4d05Tz52OpeI3kh07hpk/5zq5BLIXLyoJ3qkBkeoZq/XEDpP63Sr8LfcC0NoyZlIqb34uOBljLe38mE9l9BhOdzbFlwGrC/wJ92UT2Afe/m1bj5NIJBJJduQrE7sT+H8hZfWyDNgLcO7cuXeAYSB3IzaJRCKxwHExlDHYKmSI7/AVxsp9XGt7iZ6e61xs+yuW+99BKAqVzl+aPi9TRrI7+msTG7wNdOEmPRsVCjnY0uhgkW+IeT4PrS1jLF8+itM5kUBWC+wruEw7DxlBZoBmChnK+GwVB7v4KiVKYqASX56bHBAkYzdgtouKI+NryqA1Pf/6n/Dp2hDxvqp2grjKoksJ3qmOvl7cRIg/z/azznLjootqPr4IusNWgl+p5ekHWcP3uNeWSFi5V1iW++q3C7ebSMu2hPt27Bimvl7/raX/LMx6gyUSiUQycSYcxNbW1v4BcMu5c+deSbq9AlDOnTsXH9h2AZUTfU2JRCJJRl3gJ0CzRU+lwOtVea5N8JmeffSe6zQUR4frlhE+8y5XLl3jiRcqUrKZ8X2bVgGWVfCbC/GZq3gfTTPhphFchLkNgYPucBnHOxTWrBmlhOSyTYGHDzL2m3o9Q6huD1fwGQGsAFZ4X2ffksN4lV7SLdgX+OEzPXt5do+C329enhtp2YZwu5NGN472Wecvo+wkSqRlG16v+TGtbpck8ldv30pPz3Xjr61tiEqvWRmwjuALw6+gDGrKvHq/uWa7pGXVw1SwmiNp/H01sSirjKlWLpwYEKerqEimPzzC3fzI5D0Iojio5jwdY39i+twdO4Z5/33ts7AmtaRaIpFIJHlCCJH27/bbbz9z++23X7H4q7j99tv/9fbbb/fFHvvy7bff/mjs/ytuv/32SNKxvnf77bf/UabXjPurERKJRGKHI0eEKCkRR1ghKugRoApQRYVyRRw5kt1hqquFUBTtv0ca3zZuOOJZJ0q4LkAYfyVc1x6Tj7dwz3dENeeFQlRUc17cw/eFg7HYexG2/qqrhThSsSnhOEdYIQSII6wQ1ZwXEE05ZlGR9t5TP4DED6+62vx1FSXlofY/5MbG8X9XVMS+P3vvV/9zMmryOami8Z6fGy9ZVCTM37MkZ9Kdm9WcN/6hnXfpvsNM53jqd5vyO4yd55lfa/zPwzXRyPNx50/qcRvZLaor+q1+Epa/Cf13IZFIJBLb1AibceKEfGJra2s/BfwvMJpc5qKVDO86d+7cn9fW1kaAaj0bW1tb+3+B+nPnzv2LzZeoQfrEzgqkf5ckH7iCJ/AEtqf1eMyW/7+9+4+Nu77vOP78+hw7jkNpHGoK/pGkU/uFVdUkCl2nFVqtq6pNsMLMEEkgJV3TkgDN2kqFxqIBhtOmXaOl5AcQjZ/JQlOslY1R0akaS1C30pBN6pD4jjESx2lLSBwgcRInvvvuj7vzz7N9dmzfne/5kCzk+56/9zF8ON/r+/l+3u+hc/OZO/fy1098kIPJi2lK/Jq7l73G59af+y6JZ+7cy+pHPzbkNsiY8e4DDYKYt7c8wnlfu6NvFSx7pnheHaev/XOq/+V5nuq8kjWJ9RxMNtDQmH/f21z9aIMg5pZbzrJ+/eRU9H32+qe4ffeSPPvGxjRmxv/SSwmeeGIWyWS6+M6yZYPHVOy9fsejWN4z6+vnMtIcDUj1VQqvIDnmHtUKkqSoGPF8/XL/f5Ht9byDxdzM9jFfL3uulWxmC3eM2A85IDXoXDU18aC7C9rbK1m1ajZxPHxM2d625aRY5qY0kPOyuE2kT+w5hdihwjB8DNgbRdGmAd//bxRF92cC7+PAB6MoGq1k4EALMcSWBd9cVKymam4ODVQnDx2jKx6pqFH+sh+apyLQZ01HGHz2+qe4Z/dn6GABo4WacgwJUDzvmRfV14zYpzcbKoERA+JgMQs4kMfzcgtIkSRBAKziAR5kVV5BtoIkSSrzCtpZQ+fdnXdW89hjswYF2aFht1wUy9yUBnJeFreJhNip7hN7F/CpMAxfA7YAN48jwErSjNPeXskliypZuXJwkaauuG6cZxpeUKam6mxfAaWBe3279r0yaQEW0oWQ9u2b2mq+Vz99I7/a+jOCUfbHVlXZz3UyjVb9eiQr5mwn177YbNsqgLiiIu/iYCeozXm+fDTTwd+zmIW8wYOsoo6jVOTRpzVFBat4gHgcdz0M7cu8fn0PW7acHnEvuCRpck3qSuwUWIgrsWXBK2QqVpM5N9vbK/na6kpOnZk1KefrFzO/+gT3/23ljPvQfNlltXR25r7eOm9eiigqv1VYOPd52d5eye0rZw1qJZM2MJz1/92dzxG+d9WPufrpGwc9u7p9F3fddpYHUyv6VjHncpwHuZWl7CSuqWHFFb/ksd1h5rVGC4opBl9bH+12+sHPnUM3n+dRHmf5oNvQZ3Gas1SP8brZ3zP/EDtwJba6fRdzW79B0NWVPtu8Ok6s++6kXjgqJf49VzFyXha3YlyJlSRltLVVT0GABQio7X1nxgVYILPSmvsi5ttv23Z8ItrbK1m5sjpzG3Aw5Gug/seP8j5u3b2MZ69/atAzelpu4DubZ3G2cRGpIEFy3gW8U7eIJcFTJBubWHHFL/m73b874LVGMjTAZl8/t2pOk6AXiPt6w+ZqrXOW2aNUP87vtYbrvwOgun0X561eRUVXV9+/rZ3HPstHVv4x9fVzuai+hvr62sw/53L5Re/wzJ17x/FakqRcXIlVUfAKmYrVZM7NCy+cm7P4y2QISPHm4Zm5KnnJJbV0dQ2/5lqu+2Hh3OblaKvbY2nmAHsPj33re7YAWkeygbECYoJeklQwvuvqg1dp0yuuVSOcI6YmcYZTyclpdzPwDoC6yz5MovNg37FsK6HRi5LFzOU4W2et5s9+8IkZt2Lr33MVI+dlcXMlVpKKWEPD1F2Mm8xetcWmra1nWP/emhr3w07U0P2c43EwR6v3oXtpv3l9B6sf/RgdyUbGCrBz6OZxltFccWicIxl83rPMHnH/dGNjzIZNKeZXvs259yCOufba/jseKg51Djrayro8qmoHnOA93Hz2Ec5f+QXq6+eyaNHcvPYgS5LSDLGSNE1aW3uoqTo75NFzD7Zz6ObuZa+d83mKVUtLLxs2WDRnspzLxZQmDg76PttyaWCRskd2X5pHkItZwH4eZgVL2Mm61F3M4dxW1WMqhp0je7GjpaWXmve/h/G2rBouYMeOWX2BM9XQOOhoB83jOlf2q7s74PbbZxtkJSlPhlhJmiYtLb1s2NhLU91xAlI0s5+5Qb4f3AcHj4AUkKI50cnG5S9NSq/aYjYdFZHLRXoFe6R9otmq18OPV9HDPVf9dNBjbW3Vg3oGp88w9keLWo6zn0UsYSe94aUsSfyQh1mRVzXh0Tw056s0Bx1AigS9nDoV8+3bjvDMnXvPaQV6oDNnAtra0rcmd7euJa6q6jvWTMeEz5tM9p9XkjQ6Q6wkTaOWll5efhXePNzN3sPz+d6WBFWJfD64p1eugkxw3bK1h8OHu9n7m/NnfIDV5Gpp6WXr1p6+wkjZr0rOsG35Cxw+fIJty/+NOo70HZvPWzx41RPDqhNPJBhWcIaHuJUYSL7/Yio79hMkkyxlJ/PomvDvNbfyNEt7H2ddfBdzOJUpJlVBR6qJ1Y9+jLr46ITPPVT29+5puYHjG7eQqqsjBtpYc04rypMVtCVpprOwk4qCG+5VrKZjbqarxc5mrDYg21nKkpofc3zDAzOuGIzGp1jeM0cqEhWQGrQim/4eFtDR1z92TcV6DqYaaM48tpSdVJDMayU3l7rgKEfjC1jIGxxgYY5njNVKJ/9WO6MVFftp/ZdZzUaOckFe58r3vKWiWOamNJDzsrhZ2EmSSlBLSy+NjWNdqAtYw7c5deNSA6wKpr29ko9eAhfW13J5/VGu7npi2D7vOcFJvhi+QHOis++2+Se5iZgE+1kEwJfYRkeqiZgKDrCQm9hBgl7ic9izeiyeB4y2LzVXG6EBRwPYujW997p/hXq4qqrRi4rd2PgiR6hnO0tZwP5RzzVQImGxMknKlyFWkopA7qJPgx2kiZqndlDdvmuaRiX1a2+v5GurKznYdR4xFXSwkCdO/gW3nNlGMwcISKWLNcVf5KGOq7l72Ws0NMJBmmllHTtYDIxUwTcgRYJzKbzUlNmPWsfEbhtuaIj79l5v3XqaefP6b7XOftXVpdi4cfSiYt2ta4lraljKTvaziJgKVrI5s499oP7z1tbGbNpksTJJypdl8CSpCGQ/vP7VHQE9vbk/zDfTQXDqFLVt97oaq2nX1lbNqTODr32fpJbn+NNht+/uOLWY1Y/9Pifj9PMPsJAvsQ0YTwXfmATJzN7WsZ+7jjXEiQQTqQ0VkOpbBc2G9YG/a03VWTZs7M0rZGb/36xtu5eKQ53E753H5u6v84dnfk4r6+igmaagk7tved397JI0Qe6JVVFwr4KKVSHm5pVXziGKKhgYZOfQzcOsYCk7iYOAI2++M61jUnGZznlZ3b6LuWu+QeWxt3LuVw1IZVZR+420L7WZ/ZCozPSQHV1Aiie5iZvYwdgrtOnPCHUcpYs6xn+jWYrDh9N7UT96CRzsOm/YM5rqjvPyq+M8bUZ1+66+UJtqaKS7de2MvRDl33MVI+dlcXNPrCTNAHv2nGTr1tN9ewqz/TSXshMY3ptSmgrt7ZVc2gTnr/wCiWPZSsXD5WorM9Jq68FgAd/cNJ+amrEvTDfTwRJ2Mp8jeYw2vd+1awLFlACa6vqLKXV25e5xO9Lj+ehpuYGufa9w5M136Nr3yowNsJI0XQyxklSEWlp6+dWm50nWzGU/i/oCbFxTQ3fr2gKPTjNde3sld6yq4mjPeWQDYkyCoUF2Dt20ZW/jHaAp6Mx53uy+0w0bTlNXly2gNFyCs+kqxtXVbGR1jrY1oxVLChi5D+5wczjJmrb+W5abRuj1OtLjkqTpZ4iVpCLV03IDxzc8QLKxiTgISDY22V5HU6q6fRdr659g5crZ9MaJHM8ISNA7/A6BRILUvLq+eXr3La8PW22tqemvvtvS0surr3azbfkL1PIuQwNpkkoe5fOQSrGkqp2HWdHfJzno4MmqvxzjNwlorjgIpIb0wx0oppZ32bj8F4P2ut4/7/vDQvMcurl/3vfHeE1J0nRxT6yKgnsVVKycmypGUzEvn7tyI7dHX+EE2dXX3HLtgQVINjbRte+Vvu/b2ytpa6vm0KGAhoZ0gM1VGOmi99eSTOW6pp7ujbx43vPEtbWD9pNC+jbn0Xq+pqgYdHRVsIWH4xUkSZAgyYqqx1i/MRh2Uai6fRf/9JU9tJ69hw6a031sZ93DNT+40gtIefA9U8XIeVncJrIn1hCrouCbi4qVc1PFaLLn5XNXbmRV9NUcrW+GW8D+vn6vA0204Fh9/VxGCqML2M8bwQd4aEs3ra3VdHWlnzdvXkz3sR7OUJPz5ypGqGo8NGiPpJwKMU023zNVjJyXxW0iIdYWO5IklbHq9l18K7oprwALqfRe1VxHJlhwLJGA5AhtcTpoZvt7b2P16tmcOdMfdI8dC4DZI5wx5stszXmk4lDuvbpD9bTcYGiVpCLmnlhJkspYbdu9HMyrd2vMrYltXLe8lrhm8ArouRQcW7bsLCMVaWoKOmkN1g0KsP1yPRazks1s4Y6c57OytyTNDIZYSZLKWMWhzpxtcgaLWRH+K/f9Zgnd6zdMasGx9et7uOqqJMMqHwcnufuW1+k8Nndc53uOq9nB4uG/gZW9JWnGMMRKklTGUg2NtLFmxDY2c3mXbctfoG3PFX1HJrvv6dNPn2Lr1tM0NqYIgpjGxhTf3xLwufWX09AwnpoYAQdYyHIeYQeLiYOAGKzsLUkzjCFWkqQy1t26liU1Px7cxob9bE/cwjtbH+H/DqfD5FRraell375u3nzzBPv2dfdVMm5t7aGqanzFHc8ym68ED3B8yzaOHH53UoK2JKl4WNhJkqQylg13N7bdy5JDHxhQjXcTPQUeG5AJs6cHVSeu5TinqSHJrBF/riuuM7hK0gxliJUkqcwVezXelpbeQT1ma+/8Ov/w6EluYjuj9bSVJM1M3k4sSZJKSvf6DVyz9Y8YqaoxQF2d/eUlaaYyxEqSpJKTXjkeaRU2pqpqOkcjSZpO3k4sSZJmmIDf/rbQY5AkTRVXYiVJkiRJJcMQK0mSStInw05G2xcrSZqZDLGSJKkk/WjPe6mpPMvwIBsThqlCDEmSNA0MsZIkqWQd+HVPJrDGfV9hmGLPnpMFHpkkaapY2EmSJJU0A6sklRdXYiVJkiRJJcMQK0mSJEkqGYZYSZIkSVLJMMRKkiRJkkqGIVaSJEmSVDIMsZIkSZKkkmGIlSRJkiSVDEOsJEmSJKlkGGIlSZIkSSXDECtJkiRJKhmGWEmSJElSyTDESpIkSZJKhiFWkiRJklQyDLGSJEmSpJJhiJUkSZIklQxDrCRJkiSpZBhiJUmSJEklwxArSZIkSSoZhlhJkiRJUsmoLPQAxpAAqKgICj0OTQP/O6tYOTdVjJyXKlbOTRUj52XxGvDfJpHvzwRxHE/NaCbHJ4A9hR6EJEmSJGlKXQm8mM8Tiz3EVgNXAL8BkgUeiyRJkiRpciWAi4BfAj35/ECxh1hJkiRJkvpY2EmSJEmSVDIMsZIkSZKkkmGIlSRJkiSVDEOsJEmSJKlkGGIlSZIkSSXDECtJkiRJKhmGWEmSJElSyags9AAkgDAMPwX8DFgdRdGmzGMXAk8CC4FTwJeiKPpFocao8hGG4Wbg06Qbbp8gPS/3Zo45L1UwYRh+CHgcmA8cBZZFUfRaYUelchSG4XzS74W/A5wBXgO+HEXRW2EYfhx4CKgB9gM3RVF0uFBjVXkKw3AtcA/wkSiK/tt5ObO4EquCC8PwPGA98JMhh74N7I6i6EPAbcD2MAyD6R6fytJPSP/R+z3S8/CHA445L1VIDwKbM/NvM+kPZFIhxMB3oygKoyj6CPA68J0wDCuA7cBtmXm6G/hOAcepMhSG4WXAx4EDme+dlzOMIVbFYAPwPeDIkMdvIP2BjSiKXiS9Knb59A5N5SiKomejKDqb+fbfgcbMH0BwXqpAwjCsBy4DdmYe2glcFobh+wo3KpWrKIq6oih6YcBD/wEsAD4KnM68P0L6/fKGaR6eylgYhtWkL/KtHPCw83KGMcSqoMIw/BPg/CiKnh7y+HwgiKJoYLDtAJqmc3wScDvwz1EUpZyXKrAm4FAURUmAzD9/jfNPBZa5yLcS+EegmczqF0Dm/bIiDMO6Ag1P5ec+YHsURfsHPOa8nGHcE6spFYbhPtJvHDkPk76V4zPTNyJpzHl5YTYkhGF4I7AEuGq6xiZJJegB0vUDNgHXFXgsKmNhGP4B6buj7ir0WDS1DLGaUlEUXTbSsTAMPwFcBLwUhiHABcA1YRjWRVF0XxiGhGF4wYBVr2bg4JQPWjPeaPMyKwzD64A24NNRFL2Z+bmjzksV0EGgIQzDRBRFyTAME8DFOP9UQGEY/g3wQeCazB0rHaRvK84evwBIRVHUVagxqqx8ErgUeCPz2bIReB74Ac7LGcXbiVUwURS9GEVRfRRFC6MoWgg8DayNoui+zFN+BNwKfYG3Bni5IINVWQnD8GrSe7U/O+R2JHBeqkAyVTT/C1iceWgx8J9RFL1VuFGpnIVhuI70XsNroyjqyTz8MlCTeX+E9PvljwoxPpWfKIq+E0XRxQM+W3YCnyVde8V5OYO4Eqtidhfpyq+fJ93K5OYoilIFHpPKw6OkW0Y8nbmSC+kV2aM4L1VYtwKPh2H4LeAYsKzA41GZCsPww8A3gf8Bfp55r3wjiqLrwjC8GXgoDMPZZFqZFGygEpC5S8B5OYMEcRwXegySJEmSJOXF24klSZIkSSXDECtJkiRJKhmGWEmSJElSyTDESpIkSZJKhiFWkiRJklQyDLGSJEmSpJJhiJUkSZIklQxDrCRJkiSpZPw/57SoC4vKswAAAAAASUVORK5CYII=\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plt.figure(figsize=(16, 9))\n", + "colors = ['red','blue']\n", + "for no, i in enumerate(unique_label):\n", + " plt.scatter(X[encoded==i,0],X[encoded==i,1],label=unique_label[no],color=colors[no])\n", + "plt.legend(['negative','positive'])\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/embedded/6.supervised-embedded.ipynb b/vectorizer/6.supervised-embedded.ipynb similarity index 100% rename from embedded/6.supervised-embedded.ipynb rename to vectorizer/6.supervised-embedded.ipynb diff --git a/embedded/7.triplet-loss.ipynb b/vectorizer/7.triplet-loss.ipynb similarity index 100% rename from embedded/7.triplet-loss.ipynb rename to vectorizer/7.triplet-loss.ipynb diff --git a/embedded/8.auto-encoder.ipynb b/vectorizer/8.auto-encoder.ipynb similarity index 100% rename from embedded/8.auto-encoder.ipynb rename to vectorizer/8.auto-encoder.ipynb diff --git a/embedded/9.batch-all-triplet-loss-lstm-embedded.ipynb b/vectorizer/9.batch-all-triplet-loss-lstm-embedded.ipynb similarity index 100% rename from embedded/9.batch-all-triplet-loss-lstm-embedded.ipynb rename to vectorizer/9.batch-all-triplet-loss-lstm-embedded.ipynb diff --git a/embedded/README.md b/vectorizer/README.md similarity index 100% rename from embedded/README.md rename to vectorizer/README.md diff --git a/embedded/data/negative/negative b/vectorizer/data/negative/negative similarity index 100% rename from embedded/data/negative/negative rename to vectorizer/data/negative/negative diff --git a/embedded/data/positive/positive b/vectorizer/data/positive/positive similarity index 100% rename from embedded/data/positive/positive rename to vectorizer/data/positive/positive diff --git a/vectorizer/utils.py b/vectorizer/utils.py new file mode 100644 index 0000000..8254975 --- /dev/null +++ b/vectorizer/utils.py @@ -0,0 +1,59 @@ +import sklearn.datasets +import numpy as np +import re +import collections +import random +from sklearn import metrics +from nltk.corpus import stopwords + +english_stopwords = stopwords.words('english') + + +def clearstring(string): + string = re.sub('[^A-Za-z0-9 ]+', '', string) + string = string.split(' ') + string = filter(None, string) + string = [y.strip() for y in string if y.strip() not in english_stopwords] + string = ' '.join(string) + return string.lower() + + +def separate_dataset(trainset, ratio = 0.5): + datastring = [] + datatarget = [] + for i in range(len(trainset.data)): + data_ = trainset.data[i].split('\n') + data_ = list(filter(None, data_)) + data_ = random.sample(data_, int(len(data_) * ratio)) + for n in range(len(data_)): + data_[n] = clearstring(data_[n]) + datastring += data_ + for n in range(len(data_)): + datatarget.append(trainset.target[i]) + return datastring, datatarget + + +def build_dataset(words, n_words): + count = [['GO', 0], ['PAD', 1], ['EOS', 2], ['UNK', 3]] + count.extend(collections.Counter(words).most_common(n_words - 1)) + dictionary = dict() + for word, _ in count: + dictionary[word] = len(dictionary) + data = list() + unk_count = 0 + for word in words: + index = dictionary.get(word, 0) + if index == 0: + unk_count += 1 + data.append(index) + count[0][1] = unk_count + reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys())) + return data, count, dictionary, reversed_dictionary + + +def str_idx(corpus, dic, maxlen, UNK = 3): + X = np.zeros((len(corpus), maxlen)) + for i in range(len(corpus)): + for no, k in enumerate(corpus[i].split()[:maxlen][::-1]): + X[i, -1 - no] = dic.get(k, UNK) + return X diff --git a/misc/1.attention-visualization-bahdanau.ipynb b/visualization/1.attention-visualization-bahdanau.ipynb similarity index 100% rename from misc/1.attention-visualization-bahdanau.ipynb rename to visualization/1.attention-visualization-bahdanau.ipynb diff --git a/misc/2.attention-visualization-luong.ipynb b/visualization/2.attention-visualization-luong.ipynb similarity index 100% rename from misc/2.attention-visualization-luong.ipynb rename to visualization/2.attention-visualization-luong.ipynb diff --git a/misc/3.bert-attention.ipynb b/visualization/3.bert-attention.ipynb similarity index 100% rename from misc/3.bert-attention.ipynb rename to visualization/3.bert-attention.ipynb diff --git a/misc/4.xlnet-attention.ipynb b/visualization/4.xlnet-attention.ipynb similarity index 100% rename from misc/4.xlnet-attention.ipynb rename to visualization/4.xlnet-attention.ipynb diff --git a/visualization/5.bert-topic.ipynb b/visualization/5.bert-topic.ipynb new file mode 100644 index 0000000..85ff9f3 --- /dev/null +++ b/visualization/5.bert-topic.ipynb @@ -0,0 +1,897 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# !pip3 install bert-tensorflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download BERT-Base model" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip\n", + "# !unzip cased_L-12_H-768_A-12.zip" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download simple dataset\n", + "\n", + "I want to use negative sentiment corpus to build unsupervised topic models using Attention from BERT." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5330" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# !wget https://raw.githubusercontent.com/huseinzol05/NLP-Models-Tensorflow/master/text-classification/data/negative/negative\n", + "\n", + "with open('negative') as fopen:\n", + " negative = fopen.read().split('\\n')[:-1]\n", + "len(negative)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import bert\n", + "from bert import run_classifier\n", + "from bert import optimization\n", + "from bert import tokenization\n", + "from bert import modeling\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "import itertools" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'\n", + "BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'\n", + "BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_ngram(seq, ngram = (1, 3)):\n", + " g = []\n", + " for i in range(ngram[0], ngram[-1] + 1):\n", + " g.extend(list(ngrams_generator(seq, i)))\n", + " return g\n", + "\n", + "def _pad_sequence(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " sequence = iter(sequence)\n", + " if pad_left:\n", + " sequence = itertools.chain((left_pad_symbol,) * (n - 1), sequence)\n", + " if pad_right:\n", + " sequence = itertools.chain(sequence, (right_pad_symbol,) * (n - 1))\n", + " return sequence\n", + "\n", + "\n", + "def ngrams_generator(\n", + " sequence,\n", + " n,\n", + " pad_left = False,\n", + " pad_right = False,\n", + " left_pad_symbol = None,\n", + " right_pad_symbol = None,\n", + "):\n", + " \"\"\"\n", + " generate ngrams.\n", + "\n", + " Parameters\n", + " ----------\n", + " sequence : list of str\n", + " list of tokenize words.\n", + " n : int\n", + " ngram size\n", + "\n", + " Returns\n", + " -------\n", + " ngram: list\n", + " \"\"\"\n", + " sequence = _pad_sequence(\n", + " sequence, n, pad_left, pad_right, left_pad_symbol, right_pad_symbol\n", + " )\n", + "\n", + " history = []\n", + " while n > 1:\n", + " try:\n", + " next_item = next(sequence)\n", + " except StopIteration:\n", + " return\n", + " history.append(next_item)\n", + " n -= 1\n", + " for item in sequence:\n", + " history.append(item)\n", + " yield tuple(history)\n", + " del history[0]\n", + "\n", + "def merge_wordpiece_tokens(paired_tokens, weighted = True):\n", + " new_paired_tokens = []\n", + " n_tokens = len(paired_tokens)\n", + "\n", + " i = 0\n", + "\n", + " while i < n_tokens:\n", + " current_token, current_weight = paired_tokens[i]\n", + " if current_token.startswith('##'):\n", + " previous_token, previous_weight = new_paired_tokens.pop()\n", + " merged_token = previous_token\n", + " merged_weight = [previous_weight]\n", + " while current_token.startswith('##'):\n", + " merged_token = merged_token + current_token.replace('##', '')\n", + " merged_weight.append(current_weight)\n", + " i = i + 1\n", + " current_token, current_weight = paired_tokens[i]\n", + " merged_weight = np.mean(merged_weight)\n", + " new_paired_tokens.append((merged_token, merged_weight))\n", + "\n", + " else:\n", + " new_paired_tokens.append((current_token, current_weight))\n", + " i = i + 1\n", + "\n", + " words = [\n", + " i[0]\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n", + " ]\n", + " weights = [\n", + " i[1]\n", + " for i in new_paired_tokens\n", + " if i[0] not in ['[CLS]', '[SEP]', '[PAD]']\n", + " ]\n", + " if weighted:\n", + " weights = np.array(weights)\n", + " weights = weights / np.sum(weights)\n", + " return list(zip(words, weights))\n", + "\n", + "def _extract_attention_weights(num_layers, tf_graph):\n", + " attns = [\n", + " {\n", + " 'layer_%s'\n", + " % i: tf_graph.get_tensor_by_name(\n", + " 'bert/encoder/layer_%s/attention/self/Softmax:0' % i\n", + " )\n", + " }\n", + " for i in range(num_layers)\n", + " ]\n", + "\n", + " return attns\n", + "\n", + "def padding_sequence(seq, maxlen, padding = 'post', pad_int = 0):\n", + " padded_seqs = []\n", + " for s in seq:\n", + " if padding == 'post':\n", + " padded_seqs.append(s + [pad_int] * (maxlen - len(s)))\n", + " if padding == 'pre':\n", + " padded_seqs.append([pad_int] * (maxlen - len(s)) + s)\n", + " return padded_seqs\n", + "\n", + "\n", + "def bert_tokenization(tokenizer, texts, cls = '[CLS]', sep = '[SEP]'):\n", + "\n", + " input_ids, input_masks, segment_ids, s_tokens = [], [], [], []\n", + " for text in texts:\n", + " tokens_a = tokenizer.tokenize(text)\n", + " tokens = [cls] + tokens_a + [sep]\n", + " segment_id = [0] * len(tokens)\n", + " input_id = tokenizer.convert_tokens_to_ids(tokens)\n", + " input_mask = [1] * len(input_id)\n", + "\n", + " input_ids.append(input_id)\n", + " input_masks.append(input_mask)\n", + " segment_ids.append(segment_id)\n", + " s_tokens.append(tokens)\n", + "\n", + " maxlen = max([len(i) for i in input_ids])\n", + " input_ids = padding_sequence(input_ids, maxlen)\n", + " input_masks = padding_sequence(input_masks, maxlen)\n", + " segment_ids = padding_sequence(segment_ids, maxlen)\n", + "\n", + " return input_ids, input_masks, segment_ids, s_tokens\n", + "\n", + "class _Model:\n", + " def __init__(self, bert_config, tokenizer):\n", + " _graph = tf.Graph()\n", + " with _graph.as_default():\n", + " self.X = tf.placeholder(tf.int32, [None, None])\n", + " self._tokenizer = tokenizer\n", + "\n", + " self.model = modeling.BertModel(\n", + " config = bert_config,\n", + " is_training = False,\n", + " input_ids = self.X,\n", + " use_one_hot_embeddings = False,\n", + " )\n", + " self.logits = self.model.get_pooled_output()\n", + " self._sess = tf.InteractiveSession()\n", + " self._sess.run(tf.global_variables_initializer())\n", + " var_lists = tf.get_collection(\n", + " tf.GraphKeys.TRAINABLE_VARIABLES, scope = 'bert'\n", + " )\n", + " self._saver = tf.train.Saver(var_list = var_lists)\n", + " attns = _extract_attention_weights(\n", + " bert_config.num_hidden_layers, tf.get_default_graph()\n", + " )\n", + " self.attns = attns\n", + "\n", + " def vectorize(self, strings):\n", + "\n", + " \"\"\"\n", + " Vectorize string inputs using bert attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + "\n", + " Returns\n", + " -------\n", + " array: vectorized strings\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " batch_x, _, _, _ = bert_tokenization(self._tokenizer, strings)\n", + " return self._sess.run(self.logits, feed_dict = {self.X: batch_x})\n", + "\n", + " def attention(self, strings, method = 'last', **kwargs):\n", + " \"\"\"\n", + " Get attention string inputs from bert attention.\n", + "\n", + " Parameters\n", + " ----------\n", + " strings : str / list of str\n", + " method : str, optional (default='last')\n", + " Attention layer supported. Allowed values:\n", + "\n", + " * ``'last'`` - attention from last layer.\n", + " * ``'first'`` - attention from first layer.\n", + " * ``'mean'`` - average attentions from all layers.\n", + "\n", + " Returns\n", + " -------\n", + " array: attention\n", + " \"\"\"\n", + "\n", + " if isinstance(strings, list):\n", + " if not isinstance(strings[0], str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " else:\n", + " if not isinstance(strings, str):\n", + " raise ValueError('input must be a list of strings or a string')\n", + " if isinstance(strings, str):\n", + " strings = [strings]\n", + "\n", + " method = method.lower()\n", + " if method not in ['last', 'first', 'mean']:\n", + " raise Exception(\n", + " \"method not supported, only support 'last', 'first' and 'mean'\"\n", + " )\n", + "\n", + " batch_x, _, _, s_tokens = bert_tokenization(self._tokenizer, strings)\n", + " maxlen = max([len(s) for s in s_tokens])\n", + " s_tokens = padding_sequence(s_tokens, maxlen, pad_int = '[SEP]')\n", + " attentions = self._sess.run(self.attns, feed_dict = {self.X: batch_x})\n", + " if method == 'first':\n", + " cls_attn = list(attentions[0].values())[0][:, :, 0, :]\n", + "\n", + " if method == 'last':\n", + " cls_attn = list(attentions[-1].values())[0][:, :, 0, :]\n", + "\n", + " if method == 'mean':\n", + " combined_attentions = []\n", + " for a in attentions:\n", + " combined_attentions.append(list(a.values())[0])\n", + " cls_attn = np.mean(combined_attentions, axis = 0).mean(axis = 2)\n", + "\n", + " cls_attn = np.mean(cls_attn, axis = 1)\n", + " total_weights = np.sum(cls_attn, axis = -1, keepdims = True)\n", + " attn = cls_attn / total_weights\n", + " output = []\n", + " for i in range(attn.shape[0]):\n", + " output.append(\n", + " merge_wordpiece_tokens(list(zip(s_tokens[i], attn[i])))\n", + " )\n", + " return output" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "W0820 00:50:25.676800 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/tokenization.py:125: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.\n", + "\n", + "W0820 00:50:25.755635 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:171: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.\n", + "\n", + "W0820 00:50:25.757595 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:409: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.\n", + "\n", + "W0820 00:50:25.783736 139771824637760 deprecation_wrapper.py:119] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:490: The name tf.assert_less_equal is deprecated. Please use tf.compat.v1.assert_less_equal instead.\n", + "\n", + "W0820 00:50:26.212700 139771824637760 lazy_loader.py:50] \n", + "The TensorFlow contrib module will not be included in TensorFlow 2.0.\n", + "For more information, please see:\n", + " * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n", + " * https://github.com/tensorflow/addons\n", + " * https://github.com/tensorflow/io (for I/O related ops)\n", + "If you depend on functionality not listed there, please file an issue.\n", + "\n", + "W0820 00:50:26.247612 139771824637760 deprecation.py:323] From /home/husein/.local/lib/python3.6/site-packages/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n", + "Instructions for updating:\n", + "Use keras.layers.dense instead.\n" + ] + } + ], + "source": [ + "tokenizer = tokenization.FullTokenizer(vocab_file=BERT_VOCAB, do_lower_case=False)\n", + "bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)\n", + "model = _Model(bert_config, tokenizer)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test vectorization" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 0.55213624, -0.33787724, 0.74862313, ..., -0.04363263,\n", + " 0.31521446, 0.07524541],\n", + " [ 0.59046894, -0.304328 , 0.7821516 , ..., -0.16189037,\n", + " 0.367751 , 0.07440313]], dtype=float32)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v = model.vectorize(['hello nice to meet u', 'so long sucker'])\n", + "v" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 768)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "v.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test attention" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[('hello', 0.19323255),\n", + " ('nice', 0.19877374),\n", + " ('to', 0.19795448),\n", + " ('meet', 0.20197453),\n", + " ('u', 0.20806469)],\n", + " [('so', 0.34224316), ('long', 0.31957355), ('sucker', 0.3381833)]]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.attention(['hello nice to meet u', 'so long sucker'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Building topic modeling" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "batch_size = 10\n", + "ngram = (1, 3)\n", + "n_topics = 10" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 533/533 [01:11<00:00, 7.32it/s]\n" + ] + } + ], + "source": [ + "from sklearn.cluster import KMeans\n", + "from tqdm import tqdm\n", + "\n", + "rows, attentions = [], []\n", + "for i in tqdm(range(0, len(negative), batch_size)):\n", + " index = min(i + batch_size, len(negative))\n", + " rows.append(model.vectorize(negative[i:index]))\n", + " attentions.extend(model.attention(negative[i:index]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download simple english stopwords\n", + "\n", + "You might want to gather more of stopwords." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# !wget https://raw.githubusercontent.com/stopwords-iso/stopwords-en/master/stopwords-en.json" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1298" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import json\n", + "with open('stopwords-en.json') as fopen:\n", + " stopwords = json.load(fopen)\n", + "len(stopwords)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[\"'ll\", \"'tis\", \"'twas\", \"'ve\", '10']" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stopwords[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "processed 500\n", + "processed 1000\n", + "processed 1500\n", + "processed 2000\n", + "processed 2500\n", + "processed 3000\n", + "processed 3500\n", + "processed 4000\n", + "processed 4500\n", + "processed 5000\n" + ] + } + ], + "source": [ + "concat = np.concatenate(rows, axis = 0)\n", + "kmeans = KMeans(n_clusters = n_topics, random_state = 0).fit(concat)\n", + "labels = kmeans.labels_\n", + "\n", + "overall, filtered_a = [], []\n", + "for a in attentions:\n", + " f = [i for i in a if i[0] not in stopwords]\n", + " overall.extend(f)\n", + " filtered_a.append(f)\n", + "\n", + "o_ngram = generate_ngram(overall, ngram)\n", + "features = []\n", + "for i in o_ngram:\n", + " features.append(' '.join([w[0] for w in i]))\n", + "features = list(set(features))\n", + "\n", + "components = np.zeros((n_topics, len(features)))\n", + "for no, i in enumerate(labels):\n", + " if (no + 1) % 500 == 0:\n", + " print('processed %d'%(no + 1))\n", + " f = generate_ngram(filtered_a[no], ngram)\n", + " for w in f:\n", + " word = ' '.join([r[0] for r in w])\n", + " score = np.mean([r[1] for r in w])\n", + " if word in features:\n", + " components[i, features.index(word)] += score" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "def print_topics_modelling(\n", + " topics, feature_names, sorting, n_words = 20, return_df = True\n", + "):\n", + " if return_df:\n", + " try:\n", + " import pandas as pd\n", + " except:\n", + " raise Exception(\n", + " 'pandas not installed. Please install it and try again or set `return_df = False`'\n", + " )\n", + " df = {}\n", + " for i in range(topics):\n", + " words = []\n", + " for k in range(n_words):\n", + " words.append(feature_names[sorting[i, k]])\n", + " df['topic %d' % (i)] = words\n", + " if return_df:\n", + " return pd.DataFrame.from_dict(df)\n", + " else:\n", + " return df" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
topic 0topic 1topic 2topic 3topic 4topic 5topic 6topic 7topic 8topic 9
0moviemoviemoviemoviemoviemoviemoviemoviemoviemovie
1filmfilmfilmfilmfilmfilmfilmfilmfilmfilm
2comedycharactersbadplotstorybadcharacterscharactersstorystory
3badstorystorybadtimedulltimestorycomedyfilms
4lametimefilmscomedydirectorstorystoryfeelsbadcharacters
5dullfilmscomedymoviesmoviesactioncomedycomedyboringtime
6sillybadtimestorybadcomedyactionlovetalecomedy
7messminutescharacterstimecomedythrillerscriptscriptdullbad
8pretentiousactionmoviescharacterscharacterscharactersfilmscharacterpredictablescript
9stupidplotplothardreasonfeelsdirectoractionmoviesaction
\n", + "
" + ], + "text/plain": [ + " topic 0 topic 1 topic 2 topic 3 topic 4 topic 5 \\\n", + "0 movie movie movie movie movie movie \n", + "1 film film film film film film \n", + "2 comedy characters bad plot story bad \n", + "3 bad story story bad time dull \n", + "4 lame time films comedy director story \n", + "5 dull films comedy movies movies action \n", + "6 silly bad time story bad comedy \n", + "7 mess minutes characters time comedy thriller \n", + "8 pretentious action movies characters characters characters \n", + "9 stupid plot plot hard reason feels \n", + "\n", + " topic 6 topic 7 topic 8 topic 9 \n", + "0 movie movie movie movie \n", + "1 film film film film \n", + "2 characters characters story story \n", + "3 time story comedy films \n", + "4 story feels bad characters \n", + "5 comedy comedy boring time \n", + "6 action love tale comedy \n", + "7 script script dull bad \n", + "8 films character predictable script \n", + "9 director action movies action " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print_topics_modelling(\n", + " 10,\n", + " feature_names = np.array(features),\n", + " sorting = np.argsort(components)[:, ::-1],\n", + " n_words = 10,\n", + " return_df = True,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}